Pass manifest path via --manifest flag (required) instead of as a
positional argument, so the test exercises cmd_validate rather than
argparse error handling. Also assert returncode==1 and check stderr
for the FAILED/error message to prevent false positives.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Invalid audio rows (bad path or sample-rate mismatch) no longer
increment valid_samples; has_error is now set on any audio failure
- _check_audio_file now enforces the expected sample rate when soundfile
is available, making --sample-rate actually useful
- ref_audio missing-file warning is emitted for every invalid entry
independently, not only before the first valid one is seen
- New tests cover each of the four corrected behaviours: invalid audio
count, sample-rate mismatch, mixed ref_audio, and CLI exit code
Add a new `validate` subcommand that checks JSONL training manifests
before starting expensive fine-tuning jobs. This catches format issues,
missing audio files, and data quality problems early.
The validator performs:
- JSONL format validation (each line must be valid JSON)
- Required column checks (text, audio)
- Audio file existence and readability verification
- Duration and text length statistics (min, max, mean, median)
- Optional ref_audio column validation
- Warnings for very short (<0.3s) or very long (>30s) audio samples
Usage:
voxcpm validate --manifest train.jsonl
voxcpm validate --manifest train.jsonl --sample-rate 16000 --verbose
The module uses lazy imports for soundfile, so it works even in
minimal environments. Includes 11 unit tests covering all validation
paths.
Use context managers when reading config.json in VoxCPMModel.from_local()
and VoxCPM2Model.from_local() to prevent file descriptor leaks. Also add
explicit encoding="utf-8" to avoid locale-dependent decode errors.
Closes#235
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Support optional ref_audio samples in finetuning and make runtime device selection explicit while keeping auto fallback behavior consistent. Also ignore the local app override file to avoid accidental commits.
Made-with: Cursor
The previous guard `not text.strip() or not isinstance(text, str)` called
.strip() before verifying that text is actually a string, causing an
AttributeError (e.g. for int input) instead of the intended ValueError.
Swap operand order so isinstance check short-circuits first.
Closes#228
- Add `residual_lm_no_rope` config option in VoxCPMConfig and propagate to MiniCPMModel
- Add `no_rope` field to MiniCPM4Config; make RoPE embedding optional in MiniCPMModel and MiniCPMAttention
- Add `streaming_prefix_len` parameter to generation interface
- Fix non-streaming audio decode in continuation mode to trim leading prefix patches consistently
- Refactor streaming prefix context preparation: distinguish continuation vs. zero-shot via feat_mask trailing bit instead of audio_mask sum
Made-with: Cursor