Commit Graph

139 Commits

Author SHA1 Message Date
liuxin 19b6bf7590 fix: handle LoRA rank mismatch during inference in lora_ft_webui
Pass the selected LoRA checkpoint to load_model() on first load so the
model initializes with the correct rank from lora_config.json instead of
always defaulting to r=32.

On subsequent LoRA hot-swaps, detect rank incompatibility and
automatically reload the model with the new checkpoint's config,
preventing tensor shape mismatch errors (fixes #283).

Made-with: Cursor
2026-04-28 10:52:57 +08:00
ZGY 86bff0fc82 Merge pull request #253 from SuperMarioYL/feat/validate-training-data
feat: add voxcpm validate CLI for pre-flight training data checks
2026-04-27 21:09:41 +08:00
supermario_leo dd7b78f2c0 refactor(cli): defer soundfile and voxcpm.core imports to inference commands
Move `import soundfile as sf` and `from voxcpm.core import VoxCPM` from
module-level into the functions that require model inference (load_model,
_run_single, cmd_batch), so `voxcpm validate` can run without loading
the model/inference stack.
2026-04-25 05:09:23 +08:00
supermario_leo 29577d57f8 test: fix test_cli_validate_exit_code to use --manifest flag and assert specific exit code
Pass manifest path via --manifest flag (required) instead of as a
positional argument, so the test exercises cmd_validate rather than
argparse error handling.  Also assert returncode==1 and check stderr
for the FAILED/error message to prevent false positives.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 10:15:57 +08:00
supermario_leo 4509becfde fix: address four validation correctness issues from review
- Invalid audio rows (bad path or sample-rate mismatch) no longer
  increment valid_samples; has_error is now set on any audio failure
- _check_audio_file now enforces the expected sample rate when soundfile
  is available, making --sample-rate actually useful
- ref_audio missing-file warning is emitted for every invalid entry
  independently, not only before the first valid one is seen
- New tests cover each of the four corrected behaviours: invalid audio
  count, sample-rate mismatch, mixed ref_audio, and CLI exit code
2026-04-22 05:06:35 +08:00
ZGY cd79a647fa Merge pull request #263 from Oumnya/fix/mps-bf16-dtype
fix(mps): force float32 on Apple Silicon to avoid bf16 quality loss
2026-04-21 18:49:48 +08:00
Oumnya 96d605b9de fix(mps): align VOXCPM_MPS_DTYPE override set with get_dtype parser
Drop "half" from _VALID_DTYPE_OVERRIDES / _LOW_PRECISION_DTYPES.
get_dtype() has never accepted "half", so VOXCPM_MPS_DTYPE=half would
pass override validation and then crash downstream with
"Unsupported dtype: half". The remaining aliases (bfloat16/bf16,
float16/fp16, float32/fp32) already cover the intended dtype space.

Adds a standalone unit check under scripts/ to guard the invariant
that every accepted override parses through get_dtype().

Addresses review feedback on #263.
2026-04-21 18:24:53 +08:00
ZGY a9b03a768c Merge pull request #277 from gluttony-10/main
feat: enhance control text processing in VoxCPMDemo
2026-04-21 17:11:42 +08:00
ZGY 77f847fcba Merge pull request #268 from shaun0927/fix/lora-weights-only
fix: load legacy LoRA checkpoints with weights_only=True
2026-04-21 16:55:42 +08:00
gluttony-10 d3cc88722c feat: enhance control text processing in VoxCPMDemo
Added regex to strip parentheses from control instructions in the text synthesis method to ensure compatibility with the expected prompt format. This change improves the robustness of the input handling.
2026-04-21 07:07:24 +00:00
JunghwanNA ec2acec8a1 Harden LoRA checkpoint loading against untrusted pickle payloads
LoRA is a first-class workflow in VoxCPM, and the project already prefers
safetensors plus weights-only fallback loading for base model artifacts. The
legacy LoRA .ckpt/.pth path was the remaining place that still deserialized
arbitrary pickle objects, so this switches it to weights_only=True and adds
focused regression coverage for both model loaders.

Constraint: Must preserve compatibility with tensor-only legacy LoRA checkpoints
Rejected: Remove .ckpt/.pth support entirely | too disruptive for existing users
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep LoRA artifact handling aligned with the existing safetensors-first, weights-only loading pattern
Tested: python3 -m pytest -q tests/test_lora_checkpoint_loading.py tests/test_model_utils.py -q
Not-tested: Full end-to-end LoRA hot-load with heavyweight model assets
2026-04-18 00:31:28 +09:00
xliucs 13605c5a0e Merge pull request #266 from linyueqian/docs/add-vllm-omni-references
docs: add vLLM-Omni serving references
2026-04-17 10:46:21 +08:00
Yueqian Lin afa63e6195 docs: add vLLM-Omni serving references
Document vLLM-Omni as a production serving option for VoxCPM2
alongside the existing Nano-vLLM reference. Mirrors the addition in
README_zh.md, and adds an ecosystem table entry.

Install snippet follows the upstream vLLM-Omni installation guide
(from source, since vllm-omni is rapidly evolving).

Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
2026-04-16 21:19:27 -05:00
liuxin eae0a29908 docs: add ComfyUI RH link
Made-with: Cursor
2026-04-16 11:46:40 +08:00
Labmem-Zhouyx 35895982d7 Merge PR #212: perf: stateful streaming VAE decode — eliminate redundant overlap
- StreamingVAEDecoder caches CausalConv1d/CausalTransposeConv1d left-pad
  state between calls — one patch in, one patch out, no overlap
- _inference yields single-patch latents in streaming mode
- 2x faster streaming VAE decode, more accurate (max diff 0.0005 vs 0.0011)
2026-04-15 16:01:38 +08:00
Labmem-Zhouyx f7f1b78c4d fix: correct transpose conv context 2026-04-15 16:01:02 +08:00
oumnya 38d61cdf03 fix(mps): force float32 on Apple Silicon to avoid bf16 quality loss
VoxCPM checkpoints default to bfloat16. Following commit e4e0496 which
added MPS device routing, running with `device=mps` selects bf16 on
Apple Silicon. On Metal, bf16 introduces enough numerical drift in the
diffusion AR loop that the synthesized audio is glitched and trips the
model's badcase detector, which retries until the per-call retry budget
is exhausted. Effectively MPS support is unusable in the default config.

This patch adds a single helper, `pick_runtime_dtype(device, dtype)`,
that promotes any low-precision dtype to float32 when the resolved
device is `mps`. CUDA and CPU paths are untouched. An opt-out env var
`VOXCPM_MPS_DTYPE` lets users force a specific dtype on MPS once future
PyTorch / macOS releases improve bf16 stability.

Both VoxCPMModel and VoxCPM2Model adopt the helper in their __init__,
replacing what would otherwise be duplicated inline checks.

Verified locally on Apple M5 Max, PyTorch 2.11, macOS 15:
- VoxCPM2 (2B): clean output, RTF ~0.78 steady state
- VoxCPM 0.5B: clean output, RTF ~0.92
- No badcase retries fired in any test
- VOXCPM_MPS_DTYPE=bfloat16 round-trips and reproduces the original
  glitched output, confirming the override path.
2026-04-15 12:22:56 +08:00
刘鑫 1565e83efe fix: complete shared generator cleanup coverage
Move generator close handling into a shared utility and wire the core generation pipeline through it so partially-consumed prompt cache generators are cleaned up consistently across both model variants and the public VoxCPM wrapper.

Made-with: Cursor
2026-04-13 17:39:05 +08:00
刘鑫 61b36d4e56 refactor: centralize generator cleanup in model helpers
Factor repeated next-and-close patterns into a shared helper in both VoxCPM model variants so non-streaming inference cleans up generators consistently while keeping the issue reference close to the workaround.

Made-with: Cursor
2026-04-13 16:57:08 +08:00
刘鑫 b1584aec7c fix: stabilize CPU SDPA mask broadcasting
Use an explicit broadcastable attention mask shape during MiniCPM incremental decoding so CPU runtimes avoid a PyTorch SDPA dimension error without changing attention semantics.

Made-with: Cursor
2026-04-13 15:38:53 +08:00
supermario_leo 4457617953 feat: add voxcpm validate CLI for pre-flight training data checks
Add a new `validate` subcommand that checks JSONL training manifests
before starting expensive fine-tuning jobs. This catches format issues,
missing audio files, and data quality problems early.

The validator performs:
- JSONL format validation (each line must be valid JSON)
- Required column checks (text, audio)
- Audio file existence and readability verification
- Duration and text length statistics (min, max, mean, median)
- Optional ref_audio column validation
- Warnings for very short (<0.3s) or very long (>30s) audio samples

Usage:
  voxcpm validate --manifest train.jsonl
  voxcpm validate --manifest train.jsonl --sample-rate 16000 --verbose

The module uses lazy imports for soundfile, so it works even in
minimal environments. Includes 11 unit tests covering all validation
paths.
2026-04-13 03:15:50 +08:00
xliucs 5510503182 Merge pull request #246 from sharziki/fix/unclosed-file-handles
fix: close file handles in from_local() config loading
2026-04-11 13:10:04 +08:00
sharziki fb46aad9a5 fix: close file handles in from_local() config loading
Use context managers when reading config.json in VoxCPMModel.from_local()
and VoxCPM2Model.from_local() to prevent file descriptor leaks. Also add
explicit encoding="utf-8" to avoid locale-dependent decode errors.

Closes #235

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 00:01:14 -04:00
刘鑫 e4e049624c update finetuning pipeline and runtime device handling
Support optional ref_audio samples in finetuning and make runtime device selection explicit while keeping auto fallback behavior consistent. Also ignore the local app override file to avoid accidental commits.

Made-with: Cursor
2026-04-11 11:08:50 +08:00
xliucs abf01b9bf3 Merge pull request #229 from kuishou68/fix/issue-228-validate-text-type-order
fix: correct isinstance/strip order in _generate() to prevent AttributeError on non-string input
2026-04-10 10:30:15 +08:00
cocoon 4f4a5b9f6c fix: correct type-check order in _generate() to prevent AttributeError on non-string input
The previous guard `not text.strip() or not isinstance(text, str)` called
.strip() before verifying that text is actually a string, causing an
AttributeError (e.g. for int input) instead of the intended ValueError.

Swap operand order so isinstance check short-circuits first.

Closes #228
2026-04-09 16:13:40 +00:00
刘鑫 79c0cf68dd chore: remove accidentally committed app_local.py
Made-with: Cursor
2026-04-09 16:05:18 +08:00
刘鑫 75cfa3e9b8 fix: use uncompiled feat_encoder for prefill to prevent CUDA Graph dynamic shape accumulation (#209) 2026-04-09 16:00:17 +08:00
Labmem-Zhouyx 5611bd08a0 optim app.py 2026-04-09 00:30:19 +08:00
Kevin Knoedler 66205135fc perf: stateful streaming VAE decode — eliminate redundant overlap
Streaming decode previously re-decoded 4 overlapping patches through
the VAE each step, discarding 75% of the output. Replace with stateful
decode that carries causal conv padding buffers between calls — one
patch in, one patch out, no overlap.

Changes:
- Add StreamingVAEDecoder to audiovae/audio_vae_v2.py — caches
  CausalConv1d and CausalTransposeConv1d left-pad state between calls
- AudioVAE.streaming_decode() context manager for clean lifecycle
- _inference yields single-patch latents in streaming mode
- _generate and _generate_with_prompt_cache use StreamingVAEDecoder

Streaming VAE decode time (isolated): 289ms → 148ms (2x faster)
Stateful vs full decode: cosine 1.0000, max diff 0.0005
(more accurate than previous overlap approach at max diff 0.001)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 09:09:22 -07:00
Labmem-Zhouyx 364eff6840 update readme: python version 2026-04-08 23:07:38 +08:00
Labmem-Zhouyx 6d10932b09 update readme 2026-04-08 18:48:58 +08:00
Labmem-Zhouyx 68af4fe502 fix: ft log and setting 2.0.2 2026-04-08 18:15:17 +08:00
Labmem-Zhouyx ee3649c1b3 fix: streaming decode 2026-04-08 17:25:54 +08:00
Labmem-Zhouyx 82d77d445c fix: decode chunksize for audiovae_v2 2026-04-08 16:31:36 +08:00
Labmem-Zhouyx 8f95d13073 update readme: 30-language asr result on internal benchmark 2026-04-08 15:36:56 +08:00
Labmem-Zhouyx df38f0a167 update readme for modelscope download 2.0.1 2026-04-08 11:29:19 +08:00
Labmem-Zhouyx 9adfaf6996 update demo for zh 2026-04-08 00:15:16 +08:00
刘鑫 46cfce0c97 fix VoxCPM2 training sample_rate: 48000 -> 16000 (match AudioVAE encoder)
Made-with: Cursor
2026-04-07 22:59:18 +08:00
Labmem-Zhouyx da700f264e update ZH readme 2026-04-07 18:04:56 +08:00
Labmem-Zhouyx 9da570d409 remove wechat link 2026-04-07 15:29:12 +08:00
Labmem-Zhouyx 9374524c47 update readme 2026-04-06 23:01:16 +08:00
Labmem-Zhouyx ec6d30e996 update readme 2026-04-06 22:56:06 +08:00
Labmem-Zhouyx a010d621ff update readme
Made-with: Cursor
2.0.0
2026-04-06 22:09:24 +08:00
Dennis Huang 3f005b0dbd Enhance README formatting and community section for better visibility 2026-04-06 19:50:29 +08:00
Labmem-Zhouyx 039c6e9f92 update 2026-04-06 17:15:10 +08:00
Dennis Huang 5734ab36b6 Update README 2026-04-06 16:24:12 +08:00
Labmem-Zhouyx 746631c38d update 2026-04-06 16:10:50 +08:00
Labmem-Zhouyx 07b8b5c01f update readme 2026-04-06 15:53:58 +08:00
Labmem-Zhouyx f738cc9946 update 2026-04-03 18:46:29 +08:00