VoxCPM

Author	SHA1	Message	Date
liuxin	eae0a29908	docs: add ComfyUI RH link Made-with: Cursor	2026-04-16 11:46:40 +08:00
Labmem-Zhouyx	35895982d7	Merge PR #212 : perf: stateful streaming VAE decode — eliminate redundant overlap - StreamingVAEDecoder caches CausalConv1d/CausalTransposeConv1d left-pad state between calls — one patch in, one patch out, no overlap - _inference yields single-patch latents in streaming mode - 2x faster streaming VAE decode, more accurate (max diff 0.0005 vs 0.0011)	2026-04-15 16:01:38 +08:00
Labmem-Zhouyx	f7f1b78c4d	fix: correct transpose conv context	2026-04-15 16:01:02 +08:00
刘鑫	1565e83efe	fix: complete shared generator cleanup coverage Move generator close handling into a shared utility and wire the core generation pipeline through it so partially-consumed prompt cache generators are cleaned up consistently across both model variants and the public VoxCPM wrapper. Made-with: Cursor	2026-04-13 17:39:05 +08:00
刘鑫	61b36d4e56	refactor: centralize generator cleanup in model helpers Factor repeated next-and-close patterns into a shared helper in both VoxCPM model variants so non-streaming inference cleans up generators consistently while keeping the issue reference close to the workaround. Made-with: Cursor	2026-04-13 16:57:08 +08:00
刘鑫	b1584aec7c	fix: stabilize CPU SDPA mask broadcasting Use an explicit broadcastable attention mask shape during MiniCPM incremental decoding so CPU runtimes avoid a PyTorch SDPA dimension error without changing attention semantics. Made-with: Cursor	2026-04-13 15:38:53 +08:00
xliucs	5510503182	Merge pull request #246 from sharziki/fix/unclosed-file-handles fix: close file handles in from_local() config loading	2026-04-11 13:10:04 +08:00
sharziki	fb46aad9a5	fix: close file handles in from_local() config loading Use context managers when reading config.json in VoxCPMModel.from_local() and VoxCPM2Model.from_local() to prevent file descriptor leaks. Also add explicit encoding="utf-8" to avoid locale-dependent decode errors. Closes #235 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-11 00:01:14 -04:00
刘鑫	e4e049624c	update finetuning pipeline and runtime device handling Support optional ref_audio samples in finetuning and make runtime device selection explicit while keeping auto fallback behavior consistent. Also ignore the local app override file to avoid accidental commits. Made-with: Cursor	2026-04-11 11:08:50 +08:00
xliucs	abf01b9bf3	Merge pull request #229 from kuishou68/fix/issue-228-validate-text-type-order fix: correct isinstance/strip order in _generate() to prevent AttributeError on non-string input	2026-04-10 10:30:15 +08:00
cocoon	4f4a5b9f6c	fix: correct type-check order in _generate() to prevent AttributeError on non-string input The previous guard `not text.strip() or not isinstance(text, str)` called .strip() before verifying that text is actually a string, causing an AttributeError (e.g. for int input) instead of the intended ValueError. Swap operand order so isinstance check short-circuits first. Closes #228	2026-04-09 16:13:40 +00:00
刘鑫	79c0cf68dd	chore: remove accidentally committed app_local.py Made-with: Cursor	2026-04-09 16:05:18 +08:00
刘鑫	75cfa3e9b8	fix: use uncompiled feat_encoder for prefill to prevent CUDA Graph dynamic shape accumulation (#209 )	2026-04-09 16:00:17 +08:00
Labmem-Zhouyx	5611bd08a0	optim app.py	2026-04-09 00:30:19 +08:00
Kevin Knoedler	66205135fc	perf: stateful streaming VAE decode — eliminate redundant overlap Streaming decode previously re-decoded 4 overlapping patches through the VAE each step, discarding 75% of the output. Replace with stateful decode that carries causal conv padding buffers between calls — one patch in, one patch out, no overlap. Changes: - Add StreamingVAEDecoder to audiovae/audio_vae_v2.py — caches CausalConv1d and CausalTransposeConv1d left-pad state between calls - AudioVAE.streaming_decode() context manager for clean lifecycle - _inference yields single-patch latents in streaming mode - _generate and _generate_with_prompt_cache use StreamingVAEDecoder Streaming VAE decode time (isolated): 289ms → 148ms (2x faster) Stateful vs full decode: cosine 1.0000, max diff 0.0005 (more accurate than previous overlap approach at max diff 0.001) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 09:09:22 -07:00
Labmem-Zhouyx	364eff6840	update readme: python version	2026-04-08 23:07:38 +08:00
Labmem-Zhouyx	6d10932b09	update readme	2026-04-08 18:48:58 +08:00
Labmem-Zhouyx	68af4fe502	fix: ft log and setting 2.0.2	2026-04-08 18:15:17 +08:00
Labmem-Zhouyx	ee3649c1b3	fix: streaming decode	2026-04-08 17:25:54 +08:00
Labmem-Zhouyx	82d77d445c	fix: decode chunksize for audiovae_v2	2026-04-08 16:31:36 +08:00
Labmem-Zhouyx	8f95d13073	update readme: 30-language asr result on internal benchmark	2026-04-08 15:36:56 +08:00
Labmem-Zhouyx	df38f0a167	update readme for modelscope download 2.0.1	2026-04-08 11:29:19 +08:00
Labmem-Zhouyx	9adfaf6996	update demo for zh	2026-04-08 00:15:16 +08:00
刘鑫	46cfce0c97	fix VoxCPM2 training sample_rate: 48000 -> 16000 (match AudioVAE encoder) Made-with: Cursor	2026-04-07 22:59:18 +08:00
Labmem-Zhouyx	da700f264e	update ZH readme	2026-04-07 18:04:56 +08:00
Labmem-Zhouyx	9da570d409	remove wechat link	2026-04-07 15:29:12 +08:00
Labmem-Zhouyx	9374524c47	update readme	2026-04-06 23:01:16 +08:00
Labmem-Zhouyx	ec6d30e996	update readme	2026-04-06 22:56:06 +08:00
Labmem-Zhouyx	a010d621ff	update readme Made-with: Cursor 2.0.0	2026-04-06 22:09:24 +08:00
Dennis Huang	3f005b0dbd	Enhance README formatting and community section for better visibility	2026-04-06 19:50:29 +08:00
Labmem-Zhouyx	039c6e9f92	update	2026-04-06 17:15:10 +08:00
Dennis Huang	5734ab36b6	Update README	2026-04-06 16:24:12 +08:00
Labmem-Zhouyx	746631c38d	update	2026-04-06 16:10:50 +08:00
Labmem-Zhouyx	07b8b5c01f	update readme	2026-04-06 15:53:58 +08:00
Labmem-Zhouyx	f738cc9946	update	2026-04-03 18:46:29 +08:00
Labmem-Zhouyx	0c2cf23617	Update app.py UI, adjust streaming_prefix_len, remove legacy docs - Refine app.py: Ultimate Cloning naming, NFE slider, i18n polish - Change streaming_prefix_len default from 3 to 4 for smoother decoding - Remove legacy docs/ directory (migrated to ReadTheDocs) Made-with: Cursor	2026-04-03 18:42:41 +08:00
Labmem-Zhouyx	b823d8107c	Merge branch 'dev_2.0' of https://github.com/OpenBMB/VoxCPM into dev_2.0	2026-04-03 17:44:46 +08:00
刘鑫	a87739426f	add voxcpm2 finetune conf	2026-04-03 14:23:15 +08:00
Labmem-Zhouyx	12c2b8ff98	update readme	2026-04-02 21:01:23 +08:00
刘鑫	30c300cfe8	adjust default cfg range	2026-04-02 18:14:35 +08:00
刘鑫	addee2c550	surport voxcpm2 cli	2026-04-01 21:15:55 +08:00
Labmem-Zhouyx	42c428164c	feat: add no_rope support for residual LM and fix streaming continuation decoding - Add `residual_lm_no_rope` config option in VoxCPMConfig and propagate to MiniCPMModel - Add `no_rope` field to MiniCPM4Config; make RoPE embedding optional in MiniCPMModel and MiniCPMAttention - Add `streaming_prefix_len` parameter to generation interface - Fix non-streaming audio decode in continuation mode to trim leading prefix patches consistently - Refactor streaming prefix context preparation: distinguish continuation vs. zero-shot via feat_mask trailing bit instead of audio_mask sum Made-with: Cursor	2026-03-31 17:07:33 +08:00
刘鑫	d9cf376e16	update voxcpm2	2026-03-31 11:50:37 +08:00
刘鑫	23ed7ffeee	fix: fix some bugs in resuming multi-GPU training	2026-03-13 18:43:07 +08:00
xliucs	7823e14b82	Merge pull request #188 from haosenwang1018/fix/bare-excepts fix: use specific exceptions instead of bare except	2026-03-03 11:49:00 +08:00
haosenwang1018	8df79de636	fix: use specific exceptions instead of bare except - lora_ft_webui.py: except (JSONDecodeError, OSError) for config file - voxcpm.py: except ImportError for triton availability check	2026-02-24 22:19:45 +00:00
xliucs	acaadb19e9	Merge pull request #186 from symhsym/patch-1 Update train_voxcpm_finetune.py	2026-02-11 18:05:39 +08:00
symhsym	07e526a231	Update train_voxcpm_finetune.py 修改了issue#185中提到的问题，在训练时进行validate会对原模型执行to(torch.bfloat16)然后to(torch.float32)的操作，这样可能导致模型数值浮动，因此这个修改让validate步骤保留原模型数值	2026-02-11 11:17:47 +08:00
xliucs	7aadc6c94e	Merge pull request #161 from s3ldc/cli-arg-validation Improve CLI argument validation and help text	2026-01-24 13:06:30 +08:00
Biriy	8f3a91cac8	cli: improve argument validation and help text for VoxCPM CLI	2026-01-20 14:33:58 +05:30

1 2 3

124 Commits