Commit Graph

121 Commits

Author SHA1 Message Date
oumnya 38d61cdf03 fix(mps): force float32 on Apple Silicon to avoid bf16 quality loss
VoxCPM checkpoints default to bfloat16. Following commit e4e0496 which
added MPS device routing, running with `device=mps` selects bf16 on
Apple Silicon. On Metal, bf16 introduces enough numerical drift in the
diffusion AR loop that the synthesized audio is glitched and trips the
model's badcase detector, which retries until the per-call retry budget
is exhausted. Effectively MPS support is unusable in the default config.

This patch adds a single helper, `pick_runtime_dtype(device, dtype)`,
that promotes any low-precision dtype to float32 when the resolved
device is `mps`. CUDA and CPU paths are untouched. An opt-out env var
`VOXCPM_MPS_DTYPE` lets users force a specific dtype on MPS once future
PyTorch / macOS releases improve bf16 stability.

Both VoxCPMModel and VoxCPM2Model adopt the helper in their __init__,
replacing what would otherwise be duplicated inline checks.

Verified locally on Apple M5 Max, PyTorch 2.11, macOS 15:
- VoxCPM2 (2B): clean output, RTF ~0.78 steady state
- VoxCPM 0.5B: clean output, RTF ~0.92
- No badcase retries fired in any test
- VOXCPM_MPS_DTYPE=bfloat16 round-trips and reproduces the original
  glitched output, confirming the override path.
2026-04-15 12:22:56 +08:00
刘鑫 1565e83efe fix: complete shared generator cleanup coverage
Move generator close handling into a shared utility and wire the core generation pipeline through it so partially-consumed prompt cache generators are cleaned up consistently across both model variants and the public VoxCPM wrapper.

Made-with: Cursor
2026-04-13 17:39:05 +08:00
刘鑫 61b36d4e56 refactor: centralize generator cleanup in model helpers
Factor repeated next-and-close patterns into a shared helper in both VoxCPM model variants so non-streaming inference cleans up generators consistently while keeping the issue reference close to the workaround.

Made-with: Cursor
2026-04-13 16:57:08 +08:00
刘鑫 b1584aec7c fix: stabilize CPU SDPA mask broadcasting
Use an explicit broadcastable attention mask shape during MiniCPM incremental decoding so CPU runtimes avoid a PyTorch SDPA dimension error without changing attention semantics.

Made-with: Cursor
2026-04-13 15:38:53 +08:00
xliucs 5510503182 Merge pull request #246 from sharziki/fix/unclosed-file-handles
fix: close file handles in from_local() config loading
2026-04-11 13:10:04 +08:00
sharziki fb46aad9a5 fix: close file handles in from_local() config loading
Use context managers when reading config.json in VoxCPMModel.from_local()
and VoxCPM2Model.from_local() to prevent file descriptor leaks. Also add
explicit encoding="utf-8" to avoid locale-dependent decode errors.

Closes #235

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 00:01:14 -04:00
刘鑫 e4e049624c update finetuning pipeline and runtime device handling
Support optional ref_audio samples in finetuning and make runtime device selection explicit while keeping auto fallback behavior consistent. Also ignore the local app override file to avoid accidental commits.

Made-with: Cursor
2026-04-11 11:08:50 +08:00
xliucs abf01b9bf3 Merge pull request #229 from kuishou68/fix/issue-228-validate-text-type-order
fix: correct isinstance/strip order in _generate() to prevent AttributeError on non-string input
2026-04-10 10:30:15 +08:00
cocoon 4f4a5b9f6c fix: correct type-check order in _generate() to prevent AttributeError on non-string input
The previous guard `not text.strip() or not isinstance(text, str)` called
.strip() before verifying that text is actually a string, causing an
AttributeError (e.g. for int input) instead of the intended ValueError.

Swap operand order so isinstance check short-circuits first.

Closes #228
2026-04-09 16:13:40 +00:00
刘鑫 79c0cf68dd chore: remove accidentally committed app_local.py
Made-with: Cursor
2026-04-09 16:05:18 +08:00
刘鑫 75cfa3e9b8 fix: use uncompiled feat_encoder for prefill to prevent CUDA Graph dynamic shape accumulation (#209) 2026-04-09 16:00:17 +08:00
Labmem-Zhouyx 5611bd08a0 optim app.py 2026-04-09 00:30:19 +08:00
Labmem-Zhouyx 364eff6840 update readme: python version 2026-04-08 23:07:38 +08:00
Labmem-Zhouyx 6d10932b09 update readme 2026-04-08 18:48:58 +08:00
Labmem-Zhouyx 68af4fe502 fix: ft log and setting 2.0.2 2026-04-08 18:15:17 +08:00
Labmem-Zhouyx ee3649c1b3 fix: streaming decode 2026-04-08 17:25:54 +08:00
Labmem-Zhouyx 82d77d445c fix: decode chunksize for audiovae_v2 2026-04-08 16:31:36 +08:00
Labmem-Zhouyx 8f95d13073 update readme: 30-language asr result on internal benchmark 2026-04-08 15:36:56 +08:00
Labmem-Zhouyx df38f0a167 update readme for modelscope download 2.0.1 2026-04-08 11:29:19 +08:00
Labmem-Zhouyx 9adfaf6996 update demo for zh 2026-04-08 00:15:16 +08:00
刘鑫 46cfce0c97 fix VoxCPM2 training sample_rate: 48000 -> 16000 (match AudioVAE encoder)
Made-with: Cursor
2026-04-07 22:59:18 +08:00
Labmem-Zhouyx da700f264e update ZH readme 2026-04-07 18:04:56 +08:00
Labmem-Zhouyx 9da570d409 remove wechat link 2026-04-07 15:29:12 +08:00
Labmem-Zhouyx 9374524c47 update readme 2026-04-06 23:01:16 +08:00
Labmem-Zhouyx ec6d30e996 update readme 2026-04-06 22:56:06 +08:00
Labmem-Zhouyx a010d621ff update readme
Made-with: Cursor
2.0.0
2026-04-06 22:09:24 +08:00
Dennis Huang 3f005b0dbd Enhance README formatting and community section for better visibility 2026-04-06 19:50:29 +08:00
Labmem-Zhouyx 039c6e9f92 update 2026-04-06 17:15:10 +08:00
Dennis Huang 5734ab36b6 Update README 2026-04-06 16:24:12 +08:00
Labmem-Zhouyx 746631c38d update 2026-04-06 16:10:50 +08:00
Labmem-Zhouyx 07b8b5c01f update readme 2026-04-06 15:53:58 +08:00
Labmem-Zhouyx f738cc9946 update 2026-04-03 18:46:29 +08:00
Labmem-Zhouyx 0c2cf23617 Update app.py UI, adjust streaming_prefix_len, remove legacy docs
- Refine app.py: Ultimate Cloning naming, NFE slider, i18n polish
- Change streaming_prefix_len default from 3 to 4 for smoother decoding
- Remove legacy docs/ directory (migrated to ReadTheDocs)

Made-with: Cursor
2026-04-03 18:42:41 +08:00
Labmem-Zhouyx b823d8107c Merge branch 'dev_2.0' of https://github.com/OpenBMB/VoxCPM into dev_2.0 2026-04-03 17:44:46 +08:00
刘鑫 a87739426f add voxcpm2 finetune conf 2026-04-03 14:23:15 +08:00
Labmem-Zhouyx 12c2b8ff98 update readme 2026-04-02 21:01:23 +08:00
刘鑫 30c300cfe8 adjust default cfg range 2026-04-02 18:14:35 +08:00
刘鑫 addee2c550 surport voxcpm2 cli 2026-04-01 21:15:55 +08:00
Labmem-Zhouyx 42c428164c feat: add no_rope support for residual LM and fix streaming continuation decoding
- Add `residual_lm_no_rope` config option in VoxCPMConfig and propagate to MiniCPMModel
- Add `no_rope` field to MiniCPM4Config; make RoPE embedding optional in MiniCPMModel and MiniCPMAttention
- Add `streaming_prefix_len` parameter to generation interface
- Fix non-streaming audio decode in continuation mode to trim leading prefix patches consistently
- Refactor streaming prefix context preparation: distinguish continuation vs. zero-shot via feat_mask trailing bit instead of audio_mask sum

Made-with: Cursor
2026-03-31 17:07:33 +08:00
刘鑫 d9cf376e16 update voxcpm2 2026-03-31 11:50:37 +08:00
刘鑫 23ed7ffeee fix: fix some bugs in resuming multi-GPU training 2026-03-13 18:43:07 +08:00
xliucs 7823e14b82 Merge pull request #188 from haosenwang1018/fix/bare-excepts
fix: use specific exceptions instead of bare except
2026-03-03 11:49:00 +08:00
haosenwang1018 8df79de636 fix: use specific exceptions instead of bare except
- lora_ft_webui.py: except (JSONDecodeError, OSError) for config file
- voxcpm.py: except ImportError for triton availability check
2026-02-24 22:19:45 +00:00
xliucs acaadb19e9 Merge pull request #186 from symhsym/patch-1
Update train_voxcpm_finetune.py
2026-02-11 18:05:39 +08:00
symhsym 07e526a231 Update train_voxcpm_finetune.py
修改了issue#185中提到的问题,在训练时进行validate会对原模型执行to(torch.bfloat16)然后to(torch.float32)的操作,这样可能导致模型数值浮动,因此这个修改让validate步骤保留原模型数值
2026-02-11 11:17:47 +08:00
xliucs 7aadc6c94e Merge pull request #161 from s3ldc/cli-arg-validation
Improve CLI argument validation and help text
2026-01-24 13:06:30 +08:00
Biriy 8f3a91cac8 cli: improve argument validation and help text for VoxCPM CLI 2026-01-20 14:33:58 +05:30
xliucs e72fb42c38 Merge pull request #147 from zanellig/main
Fix README's feature checkboxes
2026-01-19 12:41:22 +08:00
Gonzalo Zanelli 6dd63a534f fix: feature checkboxes 2026-01-18 18:03:24 -03:00
刘鑫 79e75f259e Fix: optimize save ckpt function 2026-01-16 16:22:34 +08:00