VoxCPM

Author	SHA1	Message	Date
Oumnya	96d605b9de	fix(mps): align VOXCPM_MPS_DTYPE override set with get_dtype parser Drop "half" from _VALID_DTYPE_OVERRIDES / _LOW_PRECISION_DTYPES. get_dtype() has never accepted "half", so VOXCPM_MPS_DTYPE=half would pass override validation and then crash downstream with "Unsupported dtype: half". The remaining aliases (bfloat16/bf16, float16/fp16, float32/fp32) already cover the intended dtype space. Adds a standalone unit check under scripts/ to guard the invariant that every accepted override parses through get_dtype(). Addresses review feedback on #263.	2026-04-21 18:24:53 +08:00
oumnya	38d61cdf03	fix(mps): force float32 on Apple Silicon to avoid bf16 quality loss VoxCPM checkpoints default to bfloat16. Following commit `e4e0496` which added MPS device routing, running with `device=mps` selects bf16 on Apple Silicon. On Metal, bf16 introduces enough numerical drift in the diffusion AR loop that the synthesized audio is glitched and trips the model's badcase detector, which retries until the per-call retry budget is exhausted. Effectively MPS support is unusable in the default config. This patch adds a single helper, `pick_runtime_dtype(device, dtype)`, that promotes any low-precision dtype to float32 when the resolved device is `mps`. CUDA and CPU paths are untouched. An opt-out env var `VOXCPM_MPS_DTYPE` lets users force a specific dtype on MPS once future PyTorch / macOS releases improve bf16 stability. Both VoxCPMModel and VoxCPM2Model adopt the helper in their __init__, replacing what would otherwise be duplicated inline checks. Verified locally on Apple M5 Max, PyTorch 2.11, macOS 15: - VoxCPM2 (2B): clean output, RTF ~0.78 steady state - VoxCPM 0.5B: clean output, RTF ~0.92 - No badcase retries fired in any test - VOXCPM_MPS_DTYPE=bfloat16 round-trips and reproduces the original glitched output, confirming the override path.	2026-04-15 12:22:56 +08:00
刘鑫	1565e83efe	fix: complete shared generator cleanup coverage Move generator close handling into a shared utility and wire the core generation pipeline through it so partially-consumed prompt cache generators are cleaned up consistently across both model variants and the public VoxCPM wrapper. Made-with: Cursor	2026-04-13 17:39:05 +08:00
刘鑫	61b36d4e56	refactor: centralize generator cleanup in model helpers Factor repeated next-and-close patterns into a shared helper in both VoxCPM model variants so non-streaming inference cleans up generators consistently while keeping the issue reference close to the workaround. Made-with: Cursor	2026-04-13 16:57:08 +08:00
刘鑫	b1584aec7c	fix: stabilize CPU SDPA mask broadcasting Use an explicit broadcastable attention mask shape during MiniCPM incremental decoding so CPU runtimes avoid a PyTorch SDPA dimension error without changing attention semantics. Made-with: Cursor	2026-04-13 15:38:53 +08:00
xliucs	5510503182	Merge pull request #246 from sharziki/fix/unclosed-file-handles fix: close file handles in from_local() config loading	2026-04-11 13:10:04 +08:00
sharziki	fb46aad9a5	fix: close file handles in from_local() config loading Use context managers when reading config.json in VoxCPMModel.from_local() and VoxCPM2Model.from_local() to prevent file descriptor leaks. Also add explicit encoding="utf-8" to avoid locale-dependent decode errors. Closes #235 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-11 00:01:14 -04:00
刘鑫	e4e049624c	update finetuning pipeline and runtime device handling Support optional ref_audio samples in finetuning and make runtime device selection explicit while keeping auto fallback behavior consistent. Also ignore the local app override file to avoid accidental commits. Made-with: Cursor	2026-04-11 11:08:50 +08:00
xliucs	abf01b9bf3	Merge pull request #229 from kuishou68/fix/issue-228-validate-text-type-order fix: correct isinstance/strip order in _generate() to prevent AttributeError on non-string input	2026-04-10 10:30:15 +08:00
cocoon	4f4a5b9f6c	fix: correct type-check order in _generate() to prevent AttributeError on non-string input The previous guard `not text.strip() or not isinstance(text, str)` called .strip() before verifying that text is actually a string, causing an AttributeError (e.g. for int input) instead of the intended ValueError. Swap operand order so isinstance check short-circuits first. Closes #228	2026-04-09 16:13:40 +00:00
刘鑫	79c0cf68dd	chore: remove accidentally committed app_local.py Made-with: Cursor	2026-04-09 16:05:18 +08:00
刘鑫	75cfa3e9b8	fix: use uncompiled feat_encoder for prefill to prevent CUDA Graph dynamic shape accumulation (#209 )	2026-04-09 16:00:17 +08:00
Labmem-Zhouyx	5611bd08a0	optim app.py	2026-04-09 00:30:19 +08:00
Labmem-Zhouyx	364eff6840	update readme: python version	2026-04-08 23:07:38 +08:00
Labmem-Zhouyx	6d10932b09	update readme	2026-04-08 18:48:58 +08:00
Labmem-Zhouyx	68af4fe502	fix: ft log and setting 2.0.2	2026-04-08 18:15:17 +08:00
Labmem-Zhouyx	ee3649c1b3	fix: streaming decode	2026-04-08 17:25:54 +08:00
Labmem-Zhouyx	82d77d445c	fix: decode chunksize for audiovae_v2	2026-04-08 16:31:36 +08:00
Labmem-Zhouyx	8f95d13073	update readme: 30-language asr result on internal benchmark	2026-04-08 15:36:56 +08:00
Labmem-Zhouyx	df38f0a167	update readme for modelscope download 2.0.1	2026-04-08 11:29:19 +08:00
Labmem-Zhouyx	9adfaf6996	update demo for zh	2026-04-08 00:15:16 +08:00
刘鑫	46cfce0c97	fix VoxCPM2 training sample_rate: 48000 -> 16000 (match AudioVAE encoder) Made-with: Cursor	2026-04-07 22:59:18 +08:00
Labmem-Zhouyx	da700f264e	update ZH readme	2026-04-07 18:04:56 +08:00
Labmem-Zhouyx	9da570d409	remove wechat link	2026-04-07 15:29:12 +08:00
Labmem-Zhouyx	9374524c47	update readme	2026-04-06 23:01:16 +08:00
Labmem-Zhouyx	ec6d30e996	update readme	2026-04-06 22:56:06 +08:00
Labmem-Zhouyx	a010d621ff	update readme Made-with: Cursor 2.0.0	2026-04-06 22:09:24 +08:00
Dennis Huang	3f005b0dbd	Enhance README formatting and community section for better visibility	2026-04-06 19:50:29 +08:00
Labmem-Zhouyx	039c6e9f92	update	2026-04-06 17:15:10 +08:00
Dennis Huang	5734ab36b6	Update README	2026-04-06 16:24:12 +08:00
Labmem-Zhouyx	746631c38d	update	2026-04-06 16:10:50 +08:00
Labmem-Zhouyx	07b8b5c01f	update readme	2026-04-06 15:53:58 +08:00
Labmem-Zhouyx	f738cc9946	update	2026-04-03 18:46:29 +08:00
Labmem-Zhouyx	0c2cf23617	Update app.py UI, adjust streaming_prefix_len, remove legacy docs - Refine app.py: Ultimate Cloning naming, NFE slider, i18n polish - Change streaming_prefix_len default from 3 to 4 for smoother decoding - Remove legacy docs/ directory (migrated to ReadTheDocs) Made-with: Cursor	2026-04-03 18:42:41 +08:00
Labmem-Zhouyx	b823d8107c	Merge branch 'dev_2.0' of https://github.com/OpenBMB/VoxCPM into dev_2.0	2026-04-03 17:44:46 +08:00
刘鑫	a87739426f	add voxcpm2 finetune conf	2026-04-03 14:23:15 +08:00
Labmem-Zhouyx	12c2b8ff98	update readme	2026-04-02 21:01:23 +08:00
刘鑫	30c300cfe8	adjust default cfg range	2026-04-02 18:14:35 +08:00
刘鑫	addee2c550	surport voxcpm2 cli	2026-04-01 21:15:55 +08:00
Labmem-Zhouyx	42c428164c	feat: add no_rope support for residual LM and fix streaming continuation decoding - Add `residual_lm_no_rope` config option in VoxCPMConfig and propagate to MiniCPMModel - Add `no_rope` field to MiniCPM4Config; make RoPE embedding optional in MiniCPMModel and MiniCPMAttention - Add `streaming_prefix_len` parameter to generation interface - Fix non-streaming audio decode in continuation mode to trim leading prefix patches consistently - Refactor streaming prefix context preparation: distinguish continuation vs. zero-shot via feat_mask trailing bit instead of audio_mask sum Made-with: Cursor	2026-03-31 17:07:33 +08:00
刘鑫	d9cf376e16	update voxcpm2	2026-03-31 11:50:37 +08:00
刘鑫	23ed7ffeee	fix: fix some bugs in resuming multi-GPU training	2026-03-13 18:43:07 +08:00
xliucs	7823e14b82	Merge pull request #188 from haosenwang1018/fix/bare-excepts fix: use specific exceptions instead of bare except	2026-03-03 11:49:00 +08:00
haosenwang1018	8df79de636	fix: use specific exceptions instead of bare except - lora_ft_webui.py: except (JSONDecodeError, OSError) for config file - voxcpm.py: except ImportError for triton availability check	2026-02-24 22:19:45 +00:00
xliucs	acaadb19e9	Merge pull request #186 from symhsym/patch-1 Update train_voxcpm_finetune.py	2026-02-11 18:05:39 +08:00
symhsym	07e526a231	Update train_voxcpm_finetune.py 修改了issue#185中提到的问题，在训练时进行validate会对原模型执行to(torch.bfloat16)然后to(torch.float32)的操作，这样可能导致模型数值浮动，因此这个修改让validate步骤保留原模型数值	2026-02-11 11:17:47 +08:00
xliucs	7aadc6c94e	Merge pull request #161 from s3ldc/cli-arg-validation Improve CLI argument validation and help text	2026-01-24 13:06:30 +08:00
Biriy	8f3a91cac8	cli: improve argument validation and help text for VoxCPM CLI	2026-01-20 14:33:58 +05:30
xliucs	e72fb42c38	Merge pull request #147 from zanellig/main Fix README's feature checkboxes	2026-01-19 12:41:22 +08:00
Gonzalo Zanelli	6dd63a534f	fix: feature checkboxes	2026-01-18 18:03:24 -03:00

1 2 3

122 Commits