Commit Graph

127 Commits

Author SHA1 Message Date
gluttony-10 d3cc88722c feat: enhance control text processing in VoxCPMDemo
Added regex to strip parentheses from control instructions in the text synthesis method to ensure compatibility with the expected prompt format. This change improves the robustness of the input handling.
2026-04-21 07:07:24 +00:00
xliucs 13605c5a0e Merge pull request #266 from linyueqian/docs/add-vllm-omni-references
docs: add vLLM-Omni serving references
2026-04-17 10:46:21 +08:00
Yueqian Lin afa63e6195 docs: add vLLM-Omni serving references
Document vLLM-Omni as a production serving option for VoxCPM2
alongside the existing Nano-vLLM reference. Mirrors the addition in
README_zh.md, and adds an ecosystem table entry.

Install snippet follows the upstream vLLM-Omni installation guide
(from source, since vllm-omni is rapidly evolving).

Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
2026-04-16 21:19:27 -05:00
liuxin eae0a29908 docs: add ComfyUI RH link
Made-with: Cursor
2026-04-16 11:46:40 +08:00
Labmem-Zhouyx 35895982d7 Merge PR #212: perf: stateful streaming VAE decode — eliminate redundant overlap
- StreamingVAEDecoder caches CausalConv1d/CausalTransposeConv1d left-pad
  state between calls — one patch in, one patch out, no overlap
- _inference yields single-patch latents in streaming mode
- 2x faster streaming VAE decode, more accurate (max diff 0.0005 vs 0.0011)
2026-04-15 16:01:38 +08:00
Labmem-Zhouyx f7f1b78c4d fix: correct transpose conv context 2026-04-15 16:01:02 +08:00
刘鑫 1565e83efe fix: complete shared generator cleanup coverage
Move generator close handling into a shared utility and wire the core generation pipeline through it so partially-consumed prompt cache generators are cleaned up consistently across both model variants and the public VoxCPM wrapper.

Made-with: Cursor
2026-04-13 17:39:05 +08:00
刘鑫 61b36d4e56 refactor: centralize generator cleanup in model helpers
Factor repeated next-and-close patterns into a shared helper in both VoxCPM model variants so non-streaming inference cleans up generators consistently while keeping the issue reference close to the workaround.

Made-with: Cursor
2026-04-13 16:57:08 +08:00
刘鑫 b1584aec7c fix: stabilize CPU SDPA mask broadcasting
Use an explicit broadcastable attention mask shape during MiniCPM incremental decoding so CPU runtimes avoid a PyTorch SDPA dimension error without changing attention semantics.

Made-with: Cursor
2026-04-13 15:38:53 +08:00
xliucs 5510503182 Merge pull request #246 from sharziki/fix/unclosed-file-handles
fix: close file handles in from_local() config loading
2026-04-11 13:10:04 +08:00
sharziki fb46aad9a5 fix: close file handles in from_local() config loading
Use context managers when reading config.json in VoxCPMModel.from_local()
and VoxCPM2Model.from_local() to prevent file descriptor leaks. Also add
explicit encoding="utf-8" to avoid locale-dependent decode errors.

Closes #235

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 00:01:14 -04:00
刘鑫 e4e049624c update finetuning pipeline and runtime device handling
Support optional ref_audio samples in finetuning and make runtime device selection explicit while keeping auto fallback behavior consistent. Also ignore the local app override file to avoid accidental commits.

Made-with: Cursor
2026-04-11 11:08:50 +08:00
xliucs abf01b9bf3 Merge pull request #229 from kuishou68/fix/issue-228-validate-text-type-order
fix: correct isinstance/strip order in _generate() to prevent AttributeError on non-string input
2026-04-10 10:30:15 +08:00
cocoon 4f4a5b9f6c fix: correct type-check order in _generate() to prevent AttributeError on non-string input
The previous guard `not text.strip() or not isinstance(text, str)` called
.strip() before verifying that text is actually a string, causing an
AttributeError (e.g. for int input) instead of the intended ValueError.

Swap operand order so isinstance check short-circuits first.

Closes #228
2026-04-09 16:13:40 +00:00
刘鑫 79c0cf68dd chore: remove accidentally committed app_local.py
Made-with: Cursor
2026-04-09 16:05:18 +08:00
刘鑫 75cfa3e9b8 fix: use uncompiled feat_encoder for prefill to prevent CUDA Graph dynamic shape accumulation (#209) 2026-04-09 16:00:17 +08:00
Labmem-Zhouyx 5611bd08a0 optim app.py 2026-04-09 00:30:19 +08:00
Kevin Knoedler 66205135fc perf: stateful streaming VAE decode — eliminate redundant overlap
Streaming decode previously re-decoded 4 overlapping patches through
the VAE each step, discarding 75% of the output. Replace with stateful
decode that carries causal conv padding buffers between calls — one
patch in, one patch out, no overlap.

Changes:
- Add StreamingVAEDecoder to audiovae/audio_vae_v2.py — caches
  CausalConv1d and CausalTransposeConv1d left-pad state between calls
- AudioVAE.streaming_decode() context manager for clean lifecycle
- _inference yields single-patch latents in streaming mode
- _generate and _generate_with_prompt_cache use StreamingVAEDecoder

Streaming VAE decode time (isolated): 289ms → 148ms (2x faster)
Stateful vs full decode: cosine 1.0000, max diff 0.0005
(more accurate than previous overlap approach at max diff 0.001)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 09:09:22 -07:00
Labmem-Zhouyx 364eff6840 update readme: python version 2026-04-08 23:07:38 +08:00
Labmem-Zhouyx 6d10932b09 update readme 2026-04-08 18:48:58 +08:00
Labmem-Zhouyx 68af4fe502 fix: ft log and setting 2.0.2 2026-04-08 18:15:17 +08:00
Labmem-Zhouyx ee3649c1b3 fix: streaming decode 2026-04-08 17:25:54 +08:00
Labmem-Zhouyx 82d77d445c fix: decode chunksize for audiovae_v2 2026-04-08 16:31:36 +08:00
Labmem-Zhouyx 8f95d13073 update readme: 30-language asr result on internal benchmark 2026-04-08 15:36:56 +08:00
Labmem-Zhouyx df38f0a167 update readme for modelscope download 2.0.1 2026-04-08 11:29:19 +08:00
Labmem-Zhouyx 9adfaf6996 update demo for zh 2026-04-08 00:15:16 +08:00
刘鑫 46cfce0c97 fix VoxCPM2 training sample_rate: 48000 -> 16000 (match AudioVAE encoder)
Made-with: Cursor
2026-04-07 22:59:18 +08:00
Labmem-Zhouyx da700f264e update ZH readme 2026-04-07 18:04:56 +08:00
Labmem-Zhouyx 9da570d409 remove wechat link 2026-04-07 15:29:12 +08:00
Labmem-Zhouyx 9374524c47 update readme 2026-04-06 23:01:16 +08:00
Labmem-Zhouyx ec6d30e996 update readme 2026-04-06 22:56:06 +08:00
Labmem-Zhouyx a010d621ff update readme
Made-with: Cursor
2.0.0
2026-04-06 22:09:24 +08:00
Dennis Huang 3f005b0dbd Enhance README formatting and community section for better visibility 2026-04-06 19:50:29 +08:00
Labmem-Zhouyx 039c6e9f92 update 2026-04-06 17:15:10 +08:00
Dennis Huang 5734ab36b6 Update README 2026-04-06 16:24:12 +08:00
Labmem-Zhouyx 746631c38d update 2026-04-06 16:10:50 +08:00
Labmem-Zhouyx 07b8b5c01f update readme 2026-04-06 15:53:58 +08:00
Labmem-Zhouyx f738cc9946 update 2026-04-03 18:46:29 +08:00
Labmem-Zhouyx 0c2cf23617 Update app.py UI, adjust streaming_prefix_len, remove legacy docs
- Refine app.py: Ultimate Cloning naming, NFE slider, i18n polish
- Change streaming_prefix_len default from 3 to 4 for smoother decoding
- Remove legacy docs/ directory (migrated to ReadTheDocs)

Made-with: Cursor
2026-04-03 18:42:41 +08:00
Labmem-Zhouyx b823d8107c Merge branch 'dev_2.0' of https://github.com/OpenBMB/VoxCPM into dev_2.0 2026-04-03 17:44:46 +08:00
刘鑫 a87739426f add voxcpm2 finetune conf 2026-04-03 14:23:15 +08:00
Labmem-Zhouyx 12c2b8ff98 update readme 2026-04-02 21:01:23 +08:00
刘鑫 30c300cfe8 adjust default cfg range 2026-04-02 18:14:35 +08:00
刘鑫 addee2c550 surport voxcpm2 cli 2026-04-01 21:15:55 +08:00
Labmem-Zhouyx 42c428164c feat: add no_rope support for residual LM and fix streaming continuation decoding
- Add `residual_lm_no_rope` config option in VoxCPMConfig and propagate to MiniCPMModel
- Add `no_rope` field to MiniCPM4Config; make RoPE embedding optional in MiniCPMModel and MiniCPMAttention
- Add `streaming_prefix_len` parameter to generation interface
- Fix non-streaming audio decode in continuation mode to trim leading prefix patches consistently
- Refactor streaming prefix context preparation: distinguish continuation vs. zero-shot via feat_mask trailing bit instead of audio_mask sum

Made-with: Cursor
2026-03-31 17:07:33 +08:00
刘鑫 d9cf376e16 update voxcpm2 2026-03-31 11:50:37 +08:00
刘鑫 23ed7ffeee fix: fix some bugs in resuming multi-GPU training 2026-03-13 18:43:07 +08:00
xliucs 7823e14b82 Merge pull request #188 from haosenwang1018/fix/bare-excepts
fix: use specific exceptions instead of bare except
2026-03-03 11:49:00 +08:00
haosenwang1018 8df79de636 fix: use specific exceptions instead of bare except
- lora_ft_webui.py: except (JSONDecodeError, OSError) for config file
- voxcpm.py: except ImportError for triton availability check
2026-02-24 22:19:45 +00:00
xliucs acaadb19e9 Merge pull request #186 from symhsym/patch-1
Update train_voxcpm_finetune.py
2026-02-11 18:05:39 +08:00