docs/finetune.md

# VoxCPM Fine-tuning Guide

This guide covers how to fine-tune VoxCPM models with two approaches: full fine-tuning and LoRA fine-tuning.

### 🎓 SFT (Supervised Fine-Tuning)

Full fine-tuning updates all model parameters. Suitable for:
- 📊 Large, specialized datasets
- 🔄 Cases where significant behavior changes are needed

### ⚡ LoRA Fine-tuning

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that:
- 🎯 Trains only a small number of additional parameters
- 💾 Significantly reduces memory requirements and training time
- 🔀 Supports multiple LoRA adapters with hot-swapping


## Table of Contents

- [Quick Start: WebUI](#quick-start-webui)
- [Data Preparation](#data-preparation)
- [Full Fine-tuning](#full-fine-tuning)
- [LoRA Fine-tuning](#lora-fine-tuning)
- [Inference](#inference)
- [LoRA Hot-swapping](#lora-hot-swapping)
- [FAQ](#faq)

---

## Quick Start: WebUI

For users who prefer a graphical interface, we provide `lora_ft_webui.py` - a comprehensive WebUI for training and inference:

### Launch WebUI

```bash
python lora_ft_webui.py
```

Then open `http://localhost:7860` in your browser.

### Features

- **🚀 Training Tab**: Configure and start LoRA training with an intuitive interface
  - Set training parameters (learning rate, batch size, LoRA rank, etc.)
  - Monitor training progress in real-time
  - Resume training from existing checkpoints

- **🎵 Inference Tab**: Generate audio with trained models
  - Automatic base model loading from LoRA checkpoint config
  - Voice cloning with automatic ASR (reference text recognition)
  - Hot-swap between multiple LoRA models
  - Zero-shot TTS without reference audio

## Data Preparation

Training data should be prepared as a JSONL manifest file, with one sample per line:

```jsonl
{"audio": "path/to/audio1.wav", "text": "Transcript of audio 1."}
{"audio": "path/to/audio2.wav", "text": "Transcript of audio 2."}
{"audio": "path/to/audio3.wav", "text": "Optional duration field.", "duration": 3.5}
{"audio": "path/to/audio4.wav", "text": "Optional dataset_id for multi-dataset.", "dataset_id": 1}
```

### Required Fields

| Field | Description |
|-------|-------------|
| `audio` | Path to audio file (absolute or relative) |
| `text` | Corresponding transcript |

### Optional Fields

| Field | Description |
|-------|-------------|
| `duration` | Audio duration in seconds (speeds up sample filtering) |
| `dataset_id` | Dataset ID for multi-dataset training (default: 0) |

### Requirements

- Audio format: WAV
- Sample rate: 16kHz for VoxCPM-0.5B, 44.1kHz for VoxCPM1.5
- Text: Transcript matching the audio content

See `examples/train_data_example.jsonl` for a complete example.

---

## Full Fine-tuning

Full fine-tuning updates all model parameters. Suitable for large datasets or when significant behavior changes are needed.

### Configuration

Create `conf/voxcpm_v1.5/voxcpm_finetune_all.yaml`:

```yaml
pretrained_path: /path/to/VoxCPM1.5/
train_manifest: /path/to/train.jsonl
val_manifest: ""

sample_rate: 44100
batch_size: 16
grad_accum_steps: 1
num_workers: 2
num_iters: 2000
log_interval: 10
valid_interval: 1000
save_interval: 1000

learning_rate: 0.00001   # Use smaller LR for full fine-tuning
weight_decay: 0.01
warmup_steps: 100
max_steps: 2000
max_batch_tokens: 8192

save_path: /path/to/checkpoints/finetune_all
tensorboard: /path/to/logs/finetune_all

lambdas:
  loss/diff: 1.0
  loss/stop: 1.0
```

### Training

```bash
# Single GPU
python scripts/train_voxcpm_finetune.py --config_path conf/voxcpm_v1.5/voxcpm_finetune_all.yaml

# Multi-GPU
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 \
    scripts/train_voxcpm_finetune.py --config_path conf/voxcpm_v1.5/voxcpm_finetune_all.yaml
```

### Checkpoint Structure

Full fine-tuning saves a complete model directory that can be loaded directly:

```
checkpoints/finetune_all/
└── step_0002000/
    ├── model.safetensors     # Model weights (excluding audio_vae)
    ├── config.json            # Model config
    ├── audiovae.pth           # Audio VAE weights
    ├── tokenizer.json         # Tokenizer
    ├── tokenizer_config.json
    ├── special_tokens_map.json
    ├── optimizer.pth
    └── scheduler.pth
```

---

## LoRA Fine-tuning

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that trains only a small number of additional parameters, significantly reducing memory requirements.

### Configuration

Create `conf/voxcpm_v1.5/voxcpm_finetune_lora.yaml`:

```yaml
pretrained_path: /path/to/VoxCPM1.5/
train_manifest: /path/to/train.jsonl
val_manifest: ""

sample_rate: 44100
batch_size: 16
grad_accum_steps: 1
num_workers: 2
num_iters: 2000
log_interval: 10
valid_interval: 1000
save_interval: 1000

learning_rate: 0.0001    # LoRA can use larger LR
weight_decay: 0.01
warmup_steps: 100
max_steps: 2000
max_batch_tokens: 8192

save_path: /path/to/checkpoints/finetune_lora
tensorboard: /path/to/logs/finetune_lora

lambdas:
  loss/diff: 1.0
  loss/stop: 1.0

# LoRA configuration
lora:
  enable_lm: true        # Apply LoRA to Language Model
  enable_dit: true       # Apply LoRA to Diffusion Transformer
  enable_proj: false     # Apply LoRA to projection layers (optional)
  
  r: 32                  # LoRA rank (higher = more capacity)
  alpha: 16              # LoRA alpha, scaling = alpha / r
  dropout: 0.0
  
  # Target modules
  target_modules_lm: ["q_proj", "v_proj", "k_proj", "o_proj"]
  target_modules_dit: ["q_proj", "v_proj", "k_proj", "o_proj"]

# Distribution options (optional)
# hf_model_id: "openbmb/VoxCPM1.5"  # HuggingFace ID
# distribute: true                   # If true, save hf_model_id in lora_config.json
```

### LoRA Parameters

| Parameter | Description | Recommended |
|-----------|-------------|-------------|
| `enable_lm` | Apply LoRA to LM (language model) | `true` |
| `enable_dit` | Apply LoRA to DiT (diffusion model) | `true` (required for voice cloning) |
| `r` | LoRA rank (higher = more capacity) | 16-64 |
| `alpha` | Scaling factor, `scaling = alpha / r` | Usually `r/2` or `r` |
| `target_modules_*` | Layer names to add LoRA | attention layers |

### Distribution Options (Optional)

| Parameter | Description | Default |
|-----------|-------------|---------|
| `hf_model_id` | HuggingFace model ID (e.g., `openbmb/VoxCPM1.5`) | `""` |
| `distribute` | If `true`, save `hf_model_id` as `base_model` in checkpoint; otherwise save local `pretrained_path` | `false` |

> **Note**: If `distribute: true`, `hf_model_id` is required.

### Training

```bash
# Single GPU
python scripts/train_voxcpm_finetune.py --config_path conf/voxcpm_v1.5/voxcpm_finetune_lora.yaml

# Multi-GPU
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 \
    scripts/train_voxcpm_finetune.py --config_path conf/voxcpm_v1.5/voxcpm_finetune_lora.yaml
```

### Checkpoint Structure

LoRA training saves LoRA parameters and configuration:

```
checkpoints/finetune_lora/
└── step_0002000/
    ├── lora_weights.safetensors    # Only lora_A, lora_B parameters
    ├── lora_config.json            # LoRA config + base model path
    ├── optimizer.pth
    └── scheduler.pth
```

The `lora_config.json` contains:
```json
{
  "base_model": "/path/to/VoxCPM1.5/",
  "lora_config": {
    "enable_lm": true,
    "enable_dit": true,
    "r": 32,
    "alpha": 16,
    ...
  }
}
```

The `base_model` field contains:
- Local path (default): when `distribute: false` or not set
- HuggingFace ID: when `distribute: true` (e.g., `"openbmb/VoxCPM1.5"`)

This allows loading LoRA checkpoints without the original training config file.

---

## Inference

### Full Fine-tuning Inference

The checkpoint directory is a complete model, load it directly:

```bash
python scripts/test_voxcpm_ft_infer.py \
    --ckpt_dir /path/to/checkpoints/finetune_all/step_0002000 \
    --text "Hello, this is the fine-tuned model." \
    --output output.wav
```

With voice cloning:

```bash
python scripts/test_voxcpm_ft_infer.py \
    --ckpt_dir /path/to/checkpoints/finetune_all/step_0002000 \
    --text "This is voice cloning result." \
    --prompt_audio /path/to/reference.wav \
    --prompt_text "Reference audio transcript" \
    --output cloned_output.wav
```

### LoRA Inference

LoRA inference only requires the checkpoint directory (base model path and LoRA config are read from `lora_config.json`):

```bash
python scripts/test_voxcpm_lora_infer.py \
    --lora_ckpt /path/to/checkpoints/finetune_lora/step_0002000 \
    --text "Hello, this is LoRA fine-tuned result." \
    --output lora_output.wav
```

With voice cloning:

```bash
python scripts/test_voxcpm_lora_infer.py \
    --lora_ckpt /path/to/checkpoints/finetune_lora/step_0002000 \
    --text "This is voice cloning with LoRA." \
    --prompt_audio /path/to/reference.wav \
    --prompt_text "Reference audio transcript" \
    --output cloned_output.wav
```

Override base model path (optional):

```bash
python scripts/test_voxcpm_lora_infer.py \
    --lora_ckpt /path/to/checkpoints/finetune_lora/step_0002000 \
    --base_model /path/to/another/VoxCPM1.5 \
    --text "Use different base model." \
    --output output.wav
```

---

## LoRA Hot-swapping

LoRA supports dynamic loading, unloading, and switching at inference time without reloading the entire model.

### API Reference

```python
from voxcpm.core import VoxCPM
from voxcpm.model.voxcpm import LoRAConfig

# 1. Load model with LoRA structure and weights
lora_cfg = LoRAConfig(
    enable_lm=True, 
    enable_dit=True, 
    r=32, 
    alpha=16,
    target_modules_lm=["q_proj", "v_proj", "k_proj", "o_proj"],
    target_modules_dit=["q_proj", "v_proj", "k_proj", "o_proj"],
)
model = VoxCPM.from_pretrained(
    hf_model_id="openbmb/VoxCPM1.5",  # or local path
    load_denoiser=False,              # Optional: disable denoiser for faster loading
    optimize=True,                    # Enable torch.compile acceleration
    lora_config=lora_cfg,
    lora_weights_path="/path/to/lora_checkpoint",
)

# 2. Generate audio
audio = model.generate(
    text="Hello, this is LoRA fine-tuned result.",
    prompt_wav_path="/path/to/reference.wav",  # Optional: for voice cloning
    prompt_text="Reference audio transcript",   # Optional: for voice cloning
)

# 3. Disable LoRA (use base model only)
model.set_lora_enabled(False)

# 4. Re-enable LoRA
model.set_lora_enabled(True)

# 5. Unload LoRA (reset weights to zero)
model.unload_lora()

# 6. Hot-swap to another LoRA
loaded, skipped = model.load_lora("/path/to/another_lora_checkpoint")
print(f"Loaded {len(loaded)} params, skipped {len(skipped)}")

# 7. Get current LoRA weights
lora_state = model.get_lora_state_dict()
```

### Simplified Usage (Load from lora_config.json)

If your checkpoint contains `lora_config.json` (saved by the training script), you can load everything automatically:

```python
import json
from voxcpm.core import VoxCPM
from voxcpm.model.voxcpm import LoRAConfig

# Load config from checkpoint
lora_ckpt_dir = "/path/to/checkpoints/finetune_lora/step_0002000"
with open(f"{lora_ckpt_dir}/lora_config.json") as f:
    lora_info = json.load(f)

base_model = lora_info["base_model"]
lora_cfg = LoRAConfig(**lora_info["lora_config"])

# Load model with LoRA
model = VoxCPM.from_pretrained(
    hf_model_id=base_model,
    lora_config=lora_cfg,
    lora_weights_path=lora_ckpt_dir,
)
```

Or use the test script directly:

```bash
python scripts/test_voxcpm_lora_infer.py \
    --lora_ckpt /path/to/checkpoints/finetune_lora/step_0002000 \
    --text "Hello world"
```

### Method Reference

| Method | Description | torch.compile Compatible |
|--------|-------------|--------------------------|
| `load_lora(path)` | Load LoRA weights from file | ✅ |
| `set_lora_enabled(bool)` | Enable/disable LoRA | ✅ |
| `unload_lora()` | Reset LoRA weights to initial values | ✅ |
| `get_lora_state_dict()` | Get current LoRA weights | ✅ |
| `lora_enabled` | Property: check if LoRA is configured | ✅ |

---

## FAQ

### 1. Out of Memory (OOM)

- Increase `grad_accum_steps` (gradient accumulation)
- Decrease `batch_size`
- Use LoRA fine-tuning instead of full fine-tuning
- Decrease `max_batch_tokens` to filter long samples

### 2. Poor LoRA Performance

- Increase `r` (LoRA rank)
- Adjust `alpha` (try `alpha = r/2` or `alpha = r`)
- Increase training steps
- Add more target modules

### 3. Training Not Converging

- Decrease `learning_rate`
- Increase `warmup_steps`
- Check data quality

### 4. LoRA Not Taking Effect at Inference

- Check that `lora_config.json` exists in the checkpoint directory
- Check `load_lora()` return value - `skipped_keys` should be empty
- Verify `set_lora_enabled(True)` is called

### 5. Checkpoint Loading Errors

- Full fine-tuning: checkpoint directory should contain `model.safetensors` (or `pytorch_model.bin`), `config.json`, `audiovae.pth`
- LoRA: checkpoint directory should contain:
  - `lora_weights.safetensors` (or `lora_weights.ckpt`) - LoRA weights
  - `lora_config.json` - LoRA config and base model path
Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00			`# VoxCPM Fine-tuning Guide`

			`This guide covers how to fine-tune VoxCPM models with two approaches: full fine-tuning and LoRA fine-tuning.`

			`### 🎓 SFT (Supervised Fine-Tuning)`

			`Full fine-tuning updates all model parameters. Suitable for:`
			`- 📊 Large, specialized datasets`
			`- 🔄 Cases where significant behavior changes are needed`

			`### ⚡ LoRA Fine-tuning`

			`LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that:`
			`- 🎯 Trains only a small number of additional parameters`
			`- 💾 Significantly reduces memory requirements and training time`
			`- 🔀 Supports multiple LoRA adapters with hot-swapping`



			`## Table of Contents`

add lora funetine webUI; optimize lora save and load logic 2025-12-09 21:34:39 +08:00			`- [Quick Start: WebUI](#quick-start-webui)`
Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00			`- [Data Preparation](#data-preparation)`
			`- [Full Fine-tuning](#full-fine-tuning)`
			`- [LoRA Fine-tuning](#lora-fine-tuning)`
			`- [Inference](#inference)`
			`- [LoRA Hot-swapping](#lora-hot-swapping)`
			`- [FAQ](#faq)`

			`---`

add lora funetine webUI; optimize lora save and load logic 2025-12-09 21:34:39 +08:00			`## Quick Start: WebUI`

			For users who prefer a graphical interface, we provide `lora_ft_webui.py` - a comprehensive WebUI for training and inference:

			`### Launch WebUI`

			```bash
			`python lora_ft_webui.py`
			```

			Then open `http://localhost:7860` in your browser.

			`### Features`

			`- 🚀 Training Tab: Configure and start LoRA training with an intuitive interface`
			`- Set training parameters (learning rate, batch size, LoRA rank, etc.)`
			`- Monitor training progress in real-time`
			`- Resume training from existing checkpoints`

			`- 🎵 Inference Tab: Generate audio with trained models`
			`- Automatic base model loading from LoRA checkpoint config`
			`- Voice cloning with automatic ASR (reference text recognition)`
			`- Hot-swap between multiple LoRA models`
			`- Zero-shot TTS without reference audio`

Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00			`## Data Preparation`

			`Training data should be prepared as a JSONL manifest file, with one sample per line:`

			```jsonl
			`{"audio": "path/to/audio1.wav", "text": "Transcript of audio 1."}`
			`{"audio": "path/to/audio2.wav", "text": "Transcript of audio 2."}`
			`{"audio": "path/to/audio3.wav", "text": "Optional duration field.", "duration": 3.5}`
			`{"audio": "path/to/audio4.wav", "text": "Optional dataset_id for multi-dataset.", "dataset_id": 1}`
			```

			`### Required Fields`

			`\| Field \| Description \|`
			`\|-------\|-------------\|`
			\| `audio` \| Path to audio file (absolute or relative) \|
			\| `text` \| Corresponding transcript \|

			`### Optional Fields`

			`\| Field \| Description \|`
			`\|-------\|-------------\|`
			\| `duration` \| Audio duration in seconds (speeds up sample filtering) \|
			\| `dataset_id` \| Dataset ID for multi-dataset training (default: 0) \|

			`### Requirements`

			`- Audio format: WAV`
			`- Sample rate: 16kHz for VoxCPM-0.5B, 44.1kHz for VoxCPM1.5`
			`- Text: Transcript matching the audio content`

			See `examples/train_data_example.jsonl` for a complete example.

			`---`

			`## Full Fine-tuning`

			`Full fine-tuning updates all model parameters. Suitable for large datasets or when significant behavior changes are needed.`

			`### Configuration`

			Create `conf/voxcpm_v1.5/voxcpm_finetune_all.yaml`:

			```yaml
			`pretrained_path: /path/to/VoxCPM1.5/`
			`train_manifest: /path/to/train.jsonl`
			`val_manifest: ""`

			`sample_rate: 44100`
			`batch_size: 16`
			`grad_accum_steps: 1`
			`num_workers: 2`
			`num_iters: 2000`
			`log_interval: 10`
			`valid_interval: 1000`
			`save_interval: 1000`

			`learning_rate: 0.00001 # Use smaller LR for full fine-tuning`
			`weight_decay: 0.01`
			`warmup_steps: 100`
			`max_steps: 2000`
			`max_batch_tokens: 8192`

			`save_path: /path/to/checkpoints/finetune_all`
			`tensorboard: /path/to/logs/finetune_all`

			`lambdas:`
			`loss/diff: 1.0`
			`loss/stop: 1.0`
			```

			`### Training`

			```bash
			`# Single GPU`
			`python scripts/train_voxcpm_finetune.py --config_path conf/voxcpm_v1.5/voxcpm_finetune_all.yaml`

			`# Multi-GPU`
			`CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 \`
			`scripts/train_voxcpm_finetune.py --config_path conf/voxcpm_v1.5/voxcpm_finetune_all.yaml`
			```

			`### Checkpoint Structure`

			`Full fine-tuning saves a complete model directory that can be loaded directly:`

			```
			`checkpoints/finetune_all/`
			`└── step_0002000/`
			`├── model.safetensors # Model weights (excluding audio_vae)`
			`├── config.json # Model config`
			`├── audiovae.pth # Audio VAE weights`
			`├── tokenizer.json # Tokenizer`
			`├── tokenizer_config.json`
			`├── special_tokens_map.json`
			`├── optimizer.pth`
			`└── scheduler.pth`
			```

			`---`

			`## LoRA Fine-tuning`

			`LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that trains only a small number of additional parameters, significantly reducing memory requirements.`

			`### Configuration`

			Create `conf/voxcpm_v1.5/voxcpm_finetune_lora.yaml`:

			```yaml
			`pretrained_path: /path/to/VoxCPM1.5/`
			`train_manifest: /path/to/train.jsonl`
			`val_manifest: ""`

			`sample_rate: 44100`
			`batch_size: 16`
			`grad_accum_steps: 1`
			`num_workers: 2`
			`num_iters: 2000`
			`log_interval: 10`
			`valid_interval: 1000`
			`save_interval: 1000`

			`learning_rate: 0.0001 # LoRA can use larger LR`
			`weight_decay: 0.01`
			`warmup_steps: 100`
			`max_steps: 2000`
			`max_batch_tokens: 8192`

			`save_path: /path/to/checkpoints/finetune_lora`
			`tensorboard: /path/to/logs/finetune_lora`

			`lambdas:`
			`loss/diff: 1.0`
			`loss/stop: 1.0`

			`# LoRA configuration`
			`lora:`
			`enable_lm: true # Apply LoRA to Language Model`
			`enable_dit: true # Apply LoRA to Diffusion Transformer`
			`enable_proj: false # Apply LoRA to projection layers (optional)`

			`r: 32 # LoRA rank (higher = more capacity)`
			`alpha: 16 # LoRA alpha, scaling = alpha / r`
			`dropout: 0.0`

			`# Target modules`
			`target_modules_lm: ["q_proj", "v_proj", "k_proj", "o_proj"]`
			`target_modules_dit: ["q_proj", "v_proj", "k_proj", "o_proj"]`
add lora funetine webUI; optimize lora save and load logic 2025-12-09 21:34:39 +08:00
			`# Distribution options (optional)`
			`# hf_model_id: "openbmb/VoxCPM1.5" # HuggingFace ID`
			`# distribute: true # If true, save hf_model_id in lora_config.json`
Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00			```

			`### LoRA Parameters`

			`\| Parameter \| Description \| Recommended \|`
			`\|-----------\|-------------\|-------------\|`
			\| `enable_lm` \| Apply LoRA to LM (language model) \| `true` \|
			\| `enable_dit` \| Apply LoRA to DiT (diffusion model) \| `true` (required for voice cloning) \|
			\| `r` \| LoRA rank (higher = more capacity) \| 16-64 \|
			\| `alpha` \| Scaling factor, `scaling = alpha / r` \| Usually `r/2` or `r` \|
			\| `target_modules_*` \| Layer names to add LoRA \| attention layers \|

add lora funetine webUI; optimize lora save and load logic 2025-12-09 21:34:39 +08:00			`### Distribution Options (Optional)`

			`\| Parameter \| Description \| Default \|`
			`\|-----------\|-------------\|---------\|`
			\| `hf_model_id` \| HuggingFace model ID (e.g., `openbmb/VoxCPM1.5`) \| `""` \|
			\| `distribute` \| If `true`, save `hf_model_id` as `base_model` in checkpoint; otherwise save local `pretrained_path` \| `false` \|

			> Note: If `distribute: true`, `hf_model_id` is required.

Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00			`### Training`

			```bash
			`# Single GPU`
			`python scripts/train_voxcpm_finetune.py --config_path conf/voxcpm_v1.5/voxcpm_finetune_lora.yaml`

			`# Multi-GPU`
			`CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 \`
			`scripts/train_voxcpm_finetune.py --config_path conf/voxcpm_v1.5/voxcpm_finetune_lora.yaml`
			```

			`### Checkpoint Structure`

add lora funetine webUI; optimize lora save and load logic 2025-12-09 21:34:39 +08:00			`LoRA training saves LoRA parameters and configuration:`
Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00
			```
			`checkpoints/finetune_lora/`
			`└── step_0002000/`
			`├── lora_weights.safetensors # Only lora_A, lora_B parameters`
add lora funetine webUI; optimize lora save and load logic 2025-12-09 21:34:39 +08:00			`├── lora_config.json # LoRA config + base model path`
Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00			`├── optimizer.pth`
			`└── scheduler.pth`
			```

add lora funetine webUI; optimize lora save and load logic 2025-12-09 21:34:39 +08:00			The `lora_config.json` contains:
			```json
			`{`
			`"base_model": "/path/to/VoxCPM1.5/",`
			`"lora_config": {`
			`"enable_lm": true,`
			`"enable_dit": true,`
			`"r": 32,`
			`"alpha": 16,`
			`...`
			`}`
			`}`
			```

			The `base_model` field contains:
			- Local path (default): when `distribute: false` or not set
			- HuggingFace ID: when `distribute: true` (e.g., `"openbmb/VoxCPM1.5"`)

			`This allows loading LoRA checkpoints without the original training config file.`

Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00			`---`

			`## Inference`

			`### Full Fine-tuning Inference`

			`The checkpoint directory is a complete model, load it directly:`

			```bash
			`python scripts/test_voxcpm_ft_infer.py \`
			`--ckpt_dir /path/to/checkpoints/finetune_all/step_0002000 \`
			`--text "Hello, this is the fine-tuned model." \`
			`--output output.wav`
			```

			`With voice cloning:`

			```bash
			`python scripts/test_voxcpm_ft_infer.py \`
			`--ckpt_dir /path/to/checkpoints/finetune_all/step_0002000 \`
			`--text "This is voice cloning result." \`
			`--prompt_audio /path/to/reference.wav \`
			`--prompt_text "Reference audio transcript" \`
			`--output cloned_output.wav`
			```

			`### LoRA Inference`

add lora funetine webUI; optimize lora save and load logic 2025-12-09 21:34:39 +08:00			LoRA inference only requires the checkpoint directory (base model path and LoRA config are read from `lora_config.json`):
Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00
			```bash
			`python scripts/test_voxcpm_lora_infer.py \`
			`--lora_ckpt /path/to/checkpoints/finetune_lora/step_0002000 \`
			`--text "Hello, this is LoRA fine-tuned result." \`
			`--output lora_output.wav`
			```

			`With voice cloning:`

			```bash
			`python scripts/test_voxcpm_lora_infer.py \`
			`--lora_ckpt /path/to/checkpoints/finetune_lora/step_0002000 \`
			`--text "This is voice cloning with LoRA." \`
			`--prompt_audio /path/to/reference.wav \`
			`--prompt_text "Reference audio transcript" \`
			`--output cloned_output.wav`
			```

add lora funetine webUI; optimize lora save and load logic 2025-12-09 21:34:39 +08:00			`Override base model path (optional):`

			```bash
			`python scripts/test_voxcpm_lora_infer.py \`
			`--lora_ckpt /path/to/checkpoints/finetune_lora/step_0002000 \`
			`--base_model /path/to/another/VoxCPM1.5 \`
			`--text "Use different base model." \`
			`--output output.wav`
			```

Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00			`---`

			`## LoRA Hot-swapping`

			`LoRA supports dynamic loading, unloading, and switching at inference time without reloading the entire model.`

			`### API Reference`

			```python
Modify lora inference api 2025-12-05 22:22:13 +08:00			`from voxcpm.core import VoxCPM`
Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00			`from voxcpm.model.voxcpm import LoRAConfig`

Modify lora inference api 2025-12-05 22:22:13 +08:00			`# 1. Load model with LoRA structure and weights`
Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00			`lora_cfg = LoRAConfig(`
			`enable_lm=True,`
			`enable_dit=True,`
			`r=32,`
			`alpha=16,`
			`target_modules_lm=["q_proj", "v_proj", "k_proj", "o_proj"],`
			`target_modules_dit=["q_proj", "v_proj", "k_proj", "o_proj"],`
			`)`
Modify lora inference api 2025-12-05 22:22:13 +08:00			`model = VoxCPM.from_pretrained(`
			`hf_model_id="openbmb/VoxCPM1.5", # or local path`
			`load_denoiser=False, # Optional: disable denoiser for faster loading`
			`optimize=True, # Enable torch.compile acceleration`
			`lora_config=lora_cfg,`
			`lora_weights_path="/path/to/lora_checkpoint",`
Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00			`)`

Modify lora inference api 2025-12-05 22:22:13 +08:00			`# 2. Generate audio`
			`audio = model.generate(`
			`text="Hello, this is LoRA fine-tuned result.",`
			`prompt_wav_path="/path/to/reference.wav", # Optional: for voice cloning`
			`prompt_text="Reference audio transcript", # Optional: for voice cloning`
			`)`
Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00
			`# 3. Disable LoRA (use base model only)`
			`model.set_lora_enabled(False)`

			`# 4. Re-enable LoRA`
			`model.set_lora_enabled(True)`

			`# 5. Unload LoRA (reset weights to zero)`
Modify lora inference api 2025-12-05 22:22:13 +08:00			`model.unload_lora()`
Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00
			`# 6. Hot-swap to another LoRA`
Modify lora inference api 2025-12-05 22:22:13 +08:00			`loaded, skipped = model.load_lora("/path/to/another_lora_checkpoint")`
			`print(f"Loaded {len(loaded)} params, skipped {len(skipped)}")`
Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00
			`# 7. Get current LoRA weights`
			`lora_state = model.get_lora_state_dict()`
			```

add lora funetine webUI; optimize lora save and load logic 2025-12-09 21:34:39 +08:00			`### Simplified Usage (Load from lora_config.json)`
Modify lora inference api 2025-12-05 22:22:13 +08:00
add lora funetine webUI; optimize lora save and load logic 2025-12-09 21:34:39 +08:00			If your checkpoint contains `lora_config.json` (saved by the training script), you can load everything automatically:
Modify lora inference api 2025-12-05 22:22:13 +08:00
			```python
add lora funetine webUI; optimize lora save and load logic 2025-12-09 21:34:39 +08:00			`import json`
Modify lora inference api 2025-12-05 22:22:13 +08:00			`from voxcpm.core import VoxCPM`
add lora funetine webUI; optimize lora save and load logic 2025-12-09 21:34:39 +08:00			`from voxcpm.model.voxcpm import LoRAConfig`

			`# Load config from checkpoint`
			`lora_ckpt_dir = "/path/to/checkpoints/finetune_lora/step_0002000"`
			`with open(f"{lora_ckpt_dir}/lora_config.json") as f:`
			`lora_info = json.load(f)`
Modify lora inference api 2025-12-05 22:22:13 +08:00
add lora funetine webUI; optimize lora save and load logic 2025-12-09 21:34:39 +08:00			`base_model = lora_info["base_model"]`
			`lora_cfg = LoRAConfig(**lora_info["lora_config"])`

			`# Load model with LoRA`
Modify lora inference api 2025-12-05 22:22:13 +08:00			`model = VoxCPM.from_pretrained(`
add lora funetine webUI; optimize lora save and load logic 2025-12-09 21:34:39 +08:00			`hf_model_id=base_model,`
			`lora_config=lora_cfg,`
			`lora_weights_path=lora_ckpt_dir,`
Modify lora inference api 2025-12-05 22:22:13 +08:00			`)`
			```

add lora funetine webUI; optimize lora save and load logic 2025-12-09 21:34:39 +08:00			`Or use the test script directly:`

			```bash
			`python scripts/test_voxcpm_lora_infer.py \`
			`--lora_ckpt /path/to/checkpoints/finetune_lora/step_0002000 \`
			`--text "Hello world"`
			```

Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00			`### Method Reference`

			`\| Method \| Description \| torch.compile Compatible \|`
			`\|--------\|-------------\|--------------------------\|`
Modify lora inference api 2025-12-05 22:22:13 +08:00			\| `load_lora(path)` \| Load LoRA weights from file \| ✅ \|
Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00			\| `set_lora_enabled(bool)` \| Enable/disable LoRA \| ✅ \|
Modify lora inference api 2025-12-05 22:22:13 +08:00			\| `unload_lora()` \| Reset LoRA weights to initial values \| ✅ \|
Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00			\| `get_lora_state_dict()` \| Get current LoRA weights \| ✅ \|
Modify lora inference api 2025-12-05 22:22:13 +08:00			\| `lora_enabled` \| Property: check if LoRA is configured \| ✅ \|
Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00
			`---`

			`## FAQ`

			`### 1. Out of Memory (OOM)`

			- Increase `grad_accum_steps` (gradient accumulation)
			- Decrease `batch_size`
			`- Use LoRA fine-tuning instead of full fine-tuning`
			- Decrease `max_batch_tokens` to filter long samples

			`### 2. Poor LoRA Performance`

			- Increase `r` (LoRA rank)
			- Adjust `alpha` (try `alpha = r/2` or `alpha = r`)
			`- Increase training steps`
			`- Add more target modules`

			`### 3. Training Not Converging`

			- Decrease `learning_rate`
			- Increase `warmup_steps`
			`- Check data quality`

			`### 4. LoRA Not Taking Effect at Inference`

add lora funetine webUI; optimize lora save and load logic 2025-12-09 21:34:39 +08:00			- Check that `lora_config.json` exists in the checkpoint directory
Modify lora inference api 2025-12-05 22:22:13 +08:00			- Check `load_lora()` return value - `skipped_keys` should be empty
Update: VoxCPM1.5 and fine-tuning supprt 2025-12-05 21:00:01 +08:00			- Verify `set_lora_enabled(True)` is called

			`### 5. Checkpoint Loading Errors`

add lora funetine webUI; optimize lora save and load logic 2025-12-09 21:34:39 +08:00			- Full fine-tuning: checkpoint directory should contain `model.safetensors` (or `pytorch_model.bin`), `config.json`, `audiovae.pth`
			`- LoRA: checkpoint directory should contain:`
			- `lora_weights.safetensors` (or `lora_weights.ckpt`) - LoRA weights
			- `lora_config.json` - LoRA config and base model path