Commit Graph

127 Commits

Author SHA1 Message Date
JunghwanNA ec2acec8a1 Harden LoRA checkpoint loading against untrusted pickle payloads
LoRA is a first-class workflow in VoxCPM, and the project already prefers
safetensors plus weights-only fallback loading for base model artifacts. The
legacy LoRA .ckpt/.pth path was the remaining place that still deserialized
arbitrary pickle objects, so this switches it to weights_only=True and adds
focused regression coverage for both model loaders.

Constraint: Must preserve compatibility with tensor-only legacy LoRA checkpoints
Rejected: Remove .ckpt/.pth support entirely | too disruptive for existing users
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep LoRA artifact handling aligned with the existing safetensors-first, weights-only loading pattern
Tested: python3 -m pytest -q tests/test_lora_checkpoint_loading.py tests/test_model_utils.py -q
Not-tested: Full end-to-end LoRA hot-load with heavyweight model assets
2026-04-18 00:31:28 +09:00
xliucs 13605c5a0e Merge pull request #266 from linyueqian/docs/add-vllm-omni-references
docs: add vLLM-Omni serving references
2026-04-17 10:46:21 +08:00
Yueqian Lin afa63e6195 docs: add vLLM-Omni serving references
Document vLLM-Omni as a production serving option for VoxCPM2
alongside the existing Nano-vLLM reference. Mirrors the addition in
README_zh.md, and adds an ecosystem table entry.

Install snippet follows the upstream vLLM-Omni installation guide
(from source, since vllm-omni is rapidly evolving).

Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
2026-04-16 21:19:27 -05:00
liuxin eae0a29908 docs: add ComfyUI RH link
Made-with: Cursor
2026-04-16 11:46:40 +08:00
Labmem-Zhouyx 35895982d7 Merge PR #212: perf: stateful streaming VAE decode — eliminate redundant overlap
- StreamingVAEDecoder caches CausalConv1d/CausalTransposeConv1d left-pad
  state between calls — one patch in, one patch out, no overlap
- _inference yields single-patch latents in streaming mode
- 2x faster streaming VAE decode, more accurate (max diff 0.0005 vs 0.0011)
2026-04-15 16:01:38 +08:00
Labmem-Zhouyx f7f1b78c4d fix: correct transpose conv context 2026-04-15 16:01:02 +08:00
刘鑫 1565e83efe fix: complete shared generator cleanup coverage
Move generator close handling into a shared utility and wire the core generation pipeline through it so partially-consumed prompt cache generators are cleaned up consistently across both model variants and the public VoxCPM wrapper.

Made-with: Cursor
2026-04-13 17:39:05 +08:00
刘鑫 61b36d4e56 refactor: centralize generator cleanup in model helpers
Factor repeated next-and-close patterns into a shared helper in both VoxCPM model variants so non-streaming inference cleans up generators consistently while keeping the issue reference close to the workaround.

Made-with: Cursor
2026-04-13 16:57:08 +08:00
刘鑫 b1584aec7c fix: stabilize CPU SDPA mask broadcasting
Use an explicit broadcastable attention mask shape during MiniCPM incremental decoding so CPU runtimes avoid a PyTorch SDPA dimension error without changing attention semantics.

Made-with: Cursor
2026-04-13 15:38:53 +08:00
xliucs 5510503182 Merge pull request #246 from sharziki/fix/unclosed-file-handles
fix: close file handles in from_local() config loading
2026-04-11 13:10:04 +08:00
sharziki fb46aad9a5 fix: close file handles in from_local() config loading
Use context managers when reading config.json in VoxCPMModel.from_local()
and VoxCPM2Model.from_local() to prevent file descriptor leaks. Also add
explicit encoding="utf-8" to avoid locale-dependent decode errors.

Closes #235

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 00:01:14 -04:00
刘鑫 e4e049624c update finetuning pipeline and runtime device handling
Support optional ref_audio samples in finetuning and make runtime device selection explicit while keeping auto fallback behavior consistent. Also ignore the local app override file to avoid accidental commits.

Made-with: Cursor
2026-04-11 11:08:50 +08:00
xliucs abf01b9bf3 Merge pull request #229 from kuishou68/fix/issue-228-validate-text-type-order
fix: correct isinstance/strip order in _generate() to prevent AttributeError on non-string input
2026-04-10 10:30:15 +08:00
cocoon 4f4a5b9f6c fix: correct type-check order in _generate() to prevent AttributeError on non-string input
The previous guard `not text.strip() or not isinstance(text, str)` called
.strip() before verifying that text is actually a string, causing an
AttributeError (e.g. for int input) instead of the intended ValueError.

Swap operand order so isinstance check short-circuits first.

Closes #228
2026-04-09 16:13:40 +00:00
刘鑫 79c0cf68dd chore: remove accidentally committed app_local.py
Made-with: Cursor
2026-04-09 16:05:18 +08:00
刘鑫 75cfa3e9b8 fix: use uncompiled feat_encoder for prefill to prevent CUDA Graph dynamic shape accumulation (#209) 2026-04-09 16:00:17 +08:00
Labmem-Zhouyx 5611bd08a0 optim app.py 2026-04-09 00:30:19 +08:00
Kevin Knoedler 66205135fc perf: stateful streaming VAE decode — eliminate redundant overlap
Streaming decode previously re-decoded 4 overlapping patches through
the VAE each step, discarding 75% of the output. Replace with stateful
decode that carries causal conv padding buffers between calls — one
patch in, one patch out, no overlap.

Changes:
- Add StreamingVAEDecoder to audiovae/audio_vae_v2.py — caches
  CausalConv1d and CausalTransposeConv1d left-pad state between calls
- AudioVAE.streaming_decode() context manager for clean lifecycle
- _inference yields single-patch latents in streaming mode
- _generate and _generate_with_prompt_cache use StreamingVAEDecoder

Streaming VAE decode time (isolated): 289ms → 148ms (2x faster)
Stateful vs full decode: cosine 1.0000, max diff 0.0005
(more accurate than previous overlap approach at max diff 0.001)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 09:09:22 -07:00
Labmem-Zhouyx 364eff6840 update readme: python version 2026-04-08 23:07:38 +08:00
Labmem-Zhouyx 6d10932b09 update readme 2026-04-08 18:48:58 +08:00
Labmem-Zhouyx 68af4fe502 fix: ft log and setting 2.0.2 2026-04-08 18:15:17 +08:00
Labmem-Zhouyx ee3649c1b3 fix: streaming decode 2026-04-08 17:25:54 +08:00
Labmem-Zhouyx 82d77d445c fix: decode chunksize for audiovae_v2 2026-04-08 16:31:36 +08:00
Labmem-Zhouyx 8f95d13073 update readme: 30-language asr result on internal benchmark 2026-04-08 15:36:56 +08:00
Labmem-Zhouyx df38f0a167 update readme for modelscope download 2.0.1 2026-04-08 11:29:19 +08:00
Labmem-Zhouyx 9adfaf6996 update demo for zh 2026-04-08 00:15:16 +08:00
刘鑫 46cfce0c97 fix VoxCPM2 training sample_rate: 48000 -> 16000 (match AudioVAE encoder)
Made-with: Cursor
2026-04-07 22:59:18 +08:00
Labmem-Zhouyx da700f264e update ZH readme 2026-04-07 18:04:56 +08:00
Labmem-Zhouyx 9da570d409 remove wechat link 2026-04-07 15:29:12 +08:00
Labmem-Zhouyx 9374524c47 update readme 2026-04-06 23:01:16 +08:00
Labmem-Zhouyx ec6d30e996 update readme 2026-04-06 22:56:06 +08:00
Labmem-Zhouyx a010d621ff update readme
Made-with: Cursor
2.0.0
2026-04-06 22:09:24 +08:00
Dennis Huang 3f005b0dbd Enhance README formatting and community section for better visibility 2026-04-06 19:50:29 +08:00
Labmem-Zhouyx 039c6e9f92 update 2026-04-06 17:15:10 +08:00
Dennis Huang 5734ab36b6 Update README 2026-04-06 16:24:12 +08:00
Labmem-Zhouyx 746631c38d update 2026-04-06 16:10:50 +08:00
Labmem-Zhouyx 07b8b5c01f update readme 2026-04-06 15:53:58 +08:00
Labmem-Zhouyx f738cc9946 update 2026-04-03 18:46:29 +08:00
Labmem-Zhouyx 0c2cf23617 Update app.py UI, adjust streaming_prefix_len, remove legacy docs
- Refine app.py: Ultimate Cloning naming, NFE slider, i18n polish
- Change streaming_prefix_len default from 3 to 4 for smoother decoding
- Remove legacy docs/ directory (migrated to ReadTheDocs)

Made-with: Cursor
2026-04-03 18:42:41 +08:00
Labmem-Zhouyx b823d8107c Merge branch 'dev_2.0' of https://github.com/OpenBMB/VoxCPM into dev_2.0 2026-04-03 17:44:46 +08:00
刘鑫 a87739426f add voxcpm2 finetune conf 2026-04-03 14:23:15 +08:00
Labmem-Zhouyx 12c2b8ff98 update readme 2026-04-02 21:01:23 +08:00
刘鑫 30c300cfe8 adjust default cfg range 2026-04-02 18:14:35 +08:00
刘鑫 addee2c550 surport voxcpm2 cli 2026-04-01 21:15:55 +08:00
Labmem-Zhouyx 42c428164c feat: add no_rope support for residual LM and fix streaming continuation decoding
- Add `residual_lm_no_rope` config option in VoxCPMConfig and propagate to MiniCPMModel
- Add `no_rope` field to MiniCPM4Config; make RoPE embedding optional in MiniCPMModel and MiniCPMAttention
- Add `streaming_prefix_len` parameter to generation interface
- Fix non-streaming audio decode in continuation mode to trim leading prefix patches consistently
- Refactor streaming prefix context preparation: distinguish continuation vs. zero-shot via feat_mask trailing bit instead of audio_mask sum

Made-with: Cursor
2026-03-31 17:07:33 +08:00
刘鑫 d9cf376e16 update voxcpm2 2026-03-31 11:50:37 +08:00
刘鑫 23ed7ffeee fix: fix some bugs in resuming multi-GPU training 2026-03-13 18:43:07 +08:00
xliucs 7823e14b82 Merge pull request #188 from haosenwang1018/fix/bare-excepts
fix: use specific exceptions instead of bare except
2026-03-03 11:49:00 +08:00
haosenwang1018 8df79de636 fix: use specific exceptions instead of bare except
- lora_ft_webui.py: except (JSONDecodeError, OSError) for config file
- voxcpm.py: except ImportError for triton availability check
2026-02-24 22:19:45 +00:00
xliucs acaadb19e9 Merge pull request #186 from symhsym/patch-1
Update train_voxcpm_finetune.py
2026-02-11 18:05:39 +08:00