VoxCPM

Author	SHA1	Message	Date
Kevin Knoedler	66205135fc	perf: stateful streaming VAE decode — eliminate redundant overlap Streaming decode previously re-decoded 4 overlapping patches through the VAE each step, discarding 75% of the output. Replace with stateful decode that carries causal conv padding buffers between calls — one patch in, one patch out, no overlap. Changes: - Add StreamingVAEDecoder to audiovae/audio_vae_v2.py — caches CausalConv1d and CausalTransposeConv1d left-pad state between calls - AudioVAE.streaming_decode() context manager for clean lifecycle - _inference yields single-patch latents in streaming mode - _generate and _generate_with_prompt_cache use StreamingVAEDecoder Streaming VAE decode time (isolated): 289ms → 148ms (2x faster) Stateful vs full decode: cosine 1.0000, max diff 0.0005 (more accurate than previous overlap approach at max diff 0.001) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 09:09:22 -07:00
Labmem-Zhouyx	ee3649c1b3	fix: streaming decode	2026-04-08 17:25:54 +08:00
Labmem-Zhouyx	82d77d445c	fix: decode chunksize for audiovae_v2	2026-04-08 16:31:36 +08:00
Labmem-Zhouyx	df38f0a167	update readme for modelscope download	2026-04-08 11:29:19 +08:00
Labmem-Zhouyx	0c2cf23617	Update app.py UI, adjust streaming_prefix_len, remove legacy docs - Refine app.py: Ultimate Cloning naming, NFE slider, i18n polish - Change streaming_prefix_len default from 3 to 4 for smoother decoding - Remove legacy docs/ directory (migrated to ReadTheDocs) Made-with: Cursor	2026-04-03 18:42:41 +08:00
Labmem-Zhouyx	42c428164c	feat: add no_rope support for residual LM and fix streaming continuation decoding - Add `residual_lm_no_rope` config option in VoxCPMConfig and propagate to MiniCPMModel - Add `no_rope` field to MiniCPM4Config; make RoPE embedding optional in MiniCPMModel and MiniCPMAttention - Add `streaming_prefix_len` parameter to generation interface - Fix non-streaming audio decode in continuation mode to trim leading prefix patches consistently - Refactor streaming prefix context preparation: distinguish continuation vs. zero-shot via feat_mask trailing bit instead of audio_mask sum Made-with: Cursor	2026-03-31 17:07:33 +08:00
刘鑫	d9cf376e16	update voxcpm2	2026-03-31 11:50:37 +08:00

7 Commits