Commit Graph

6 Commits

Author SHA1 Message Date
supermario_leo 4457617953 feat: add voxcpm validate CLI for pre-flight training data checks
Add a new `validate` subcommand that checks JSONL training manifests
before starting expensive fine-tuning jobs. This catches format issues,
missing audio files, and data quality problems early.

The validator performs:
- JSONL format validation (each line must be valid JSON)
- Required column checks (text, audio)
- Audio file existence and readability verification
- Duration and text length statistics (min, max, mean, median)
- Optional ref_audio column validation
- Warnings for very short (<0.3s) or very long (>30s) audio samples

Usage:
  voxcpm validate --manifest train.jsonl
  voxcpm validate --manifest train.jsonl --sample-rate 16000 --verbose

The module uses lazy imports for soundfile, so it works even in
minimal environments. Includes 11 unit tests covering all validation
paths.
2026-04-13 03:15:50 +08:00
刘鑫 e4e049624c update finetuning pipeline and runtime device handling
Support optional ref_audio samples in finetuning and make runtime device selection explicit while keeping auto fallback behavior consistent. Also ignore the local app override file to avoid accidental commits.

Made-with: Cursor
2026-04-11 11:08:50 +08:00
刘鑫 d9cf376e16 update voxcpm2 2026-03-31 11:50:37 +08:00
刘鑫 e8dd956fc2 Print all log messages to stderr instead of stdout 2026-01-12 15:30:45 +08:00
jayllfpt de11c6a8cb OPTIMIZE: Improve sample length computation by using batch column access 2025-12-20 06:32:39 +07:00
Labmem-Zhouyx 461ad7e506 Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00