This commit is contained in:
Labmem-Zhouyx
2026-04-06 16:10:50 +08:00
parent 07b8b5c01f
commit 746631c38d
+40 -34
View File
@@ -3,7 +3,7 @@
<p align="center">
<a href="https://github.com/OpenBMB/VoxCPM/"><img src="https://img.shields.io/badge/Project%20Page-GitHub-blue" alt="Project Page"></a>
<a href="https://huggingface.co/spaces/OpenBMB/VoxCPM-Demo"><img src="https://img.shields.io/badge/Live%20Playground-Demo-orange" alt="Live Playground"></a>
<a href="https://voxcpm.readthedocs.io/en/dev_2.0/"><img src="https://img.shields.io/badge/Docs-ReadTheDocs-8CA1AF" alt="Documentation"></a>
<a href="https://voxcpm.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/Docs-ReadTheDocs-8CA1AF" alt="Documentation"></a>
<a href="https://huggingface.co/openbmb/VoxCPM2"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-VoxCPM2-yellow" alt="Hugging Face"></a>
<a href="https://modelscope.cn/models/OpenBMB/VoxCPM2"><img src="https://img.shields.io/badge/ModelScope-VoxCPM2-purple" alt="ModelScope"></a>
</p>
@@ -43,7 +43,7 @@ Chinese Dialect: 四川话, 粤语, 吴语, 东北话, 河南话, 陕西话, 山
### News
* **[2026.04]** 🔥 We release **VoxCPM2** — 2B, 30 languages, Voice Design & Controllable Voice Cloning, 48kHz audio output! [Weights](https://huggingface.co/openbmb/VoxCPM2) | [Docs](https://voxcpm.readthedocs.io/en/dev_2.0/)
* **[2026.04]** 🔥 We release **VoxCPM2** — 2B, 30 languages, Voice Design & Controllable Voice Cloning, 48kHz audio output! [Weights](https://huggingface.co/openbmb/VoxCPM2) | [Docs](https://voxcpm.readthedocs.io/en/latest/)
* **[2025.12]** 🎉 Open-source **VoxCPM1.5** [weights](https://huggingface.co/openbmb/VoxCPM1.5) with SFT & LoRA fine-tuning. (**🏆 #1 GitHub Trending**)
* **[2025.09]** 🔥 Release VoxCPM [Technical Report](https://arxiv.org/abs/2509.24650).
* **[2025.09]** 🎉 Open-source **VoxCPM-0.5B** [weights](https://huggingface.co/openbmb/VoxCPM-0.5B) & [Playground](https://huggingface.co/spaces/OpenBMB/VoxCPM-Demo). (**🏆 #1 HuggingFace Trending**)
@@ -73,10 +73,10 @@ Chinese Dialect: 四川话, 粤语, 吴语, 东北话, 河南话, 陕西话, 山
### Installation
```sh
pip install voxcpm
uv pip install voxcpm
```
> **Requirements:** Python ≥ 3.10, PyTorch ≥ 2.5.0, CUDA ≥ 12.0. See [Quick Start Docs](https://voxcpm.readthedocs.io/en/dev_2.0/quickstart.html) for details.
> **Requirements:** Python ≥ 3.10, PyTorch ≥ 2.5.0, CUDA ≥ 12.0. See [Quick Start Docs](https://voxcpm.readthedocs.io/en/latest/quickstart.html) for details.
### Python API
@@ -86,14 +86,18 @@ pip install voxcpm
from voxcpm import VoxCPM
import soundfile as sf
model = VoxCPM.from_pretrained("openbmb/VoxCPM2")
model = VoxCPM.from_pretrained(
"openbmb/VoxCPM2"
load_denoiser=False,
)
wav = model.generate(
text="VoxCPM2 brings multilingual support, creative voice design, and controllable voice cloning.",
text="VoxCPM2 is the current recommended release for realistic multilingual speech synthesis.",
cfg_value=2.0,
inference_timesteps=10,
)
sf.write("output.wav", wav, 48000)
sf.write("demo.wav", wav, model.tts_model.sample_rate)
print("saved: demo.wav")
```
#### 🎨 Voice Design
@@ -102,34 +106,36 @@ Create a voice from a natural-language description — no reference audio needed
```python
wav = model.generate(
text="(A warm, gentle female voice in her 30s with a calm and soothing tone)"
"Welcome to VoxCPM2, the next generation of realistic speech synthesis.",
text="(A young woman, gentle and sweet voice)Hello, welcome to VoxCPM2!",
cfg_value=2.0,
inference_timesteps=10,
)
sf.write("voice_design.wav", wav, 48000)
sf.write("voice_design.wav", wav, model.tts_model.sample_rate)
```
#### 🎛️ Controllable Cloning
#### 🎛️ Controllable Voice Cloning
Clone any voice from a short reference clip, with optional style guidance:
Upload a reference audio. The model clones the timbre, and you can still use control instructions to adjust speed, emotion, or style.
```python
wav = model.generate(
text="This is a voice cloning demonstration using VoxCPM2.",
reference_wav_path="speaker_reference.wav",
text="This is a cloned voice generated by VoxCPM2.",
reference_wav_path="speaker.wav",
)
sf.write("cloned.wav", wav, 48000)
sf.write("clone.wav", wav, model.tts_model.sample_rate)
wav = model.generate(
text="(Speaking slowly with a whispering, mysterious tone)"
"The secret lies hidden in the ancient library, waiting to be discovered.",
reference_wav_path="speaker_reference.wav",
text="(slightly faster, cheerful tone)This is a cloned voice with style control.",
reference_wav_path="speaker.wav",
cfg_value=2.0,
inference_timesteps=10,
)
sf.write("style_control.wav", wav, 48000)
sf.write("controllable_clone.wav", wav, model.tts_model.sample_rate)
```
#### 🎙️ Ultimate Cloning
Provide both the reference audio and its transcript for audio-continuation-based cloning with every vocal nuance reproduced. For maximum cloning similarity, pass the same reference clip to both `reference_wav_path` and `prompt_wav_path` as shown below:
Provide both the reference audio and its exact transcript for audio-continuation-based cloning with every vocal nuance reproduced. For maximum cloning similarity, pass the same reference clip to both `reference_wav_path` and `prompt_wav_path` as shown below:
```python
wav = model.generate(
@@ -138,7 +144,7 @@ wav = model.generate(
prompt_text="The transcript of the reference audio.",
reference_wav_path="speaker_reference.wav",
)
sf.write("ultimate_cloned.wav", wav, 48000)
sf.write("hifi_clone.wav", wav, model.tts_model.sample_rate)
```
<details>
@@ -153,7 +159,7 @@ for chunk in model.generate_streaming(
):
chunks.append(chunk)
wav = np.concatenate(chunks)
sf.write("streaming.wav", wav, 48000)
sf.write("streaming.wav", wav, model.tts_model.sample_rate)
```
</details>
@@ -215,9 +221,9 @@ sf.write("out.wav", np.concatenate(chunks), 48000)
server.stop()
```
> **RTF as low as ~0.13 on NVIDIA RTX 4090** (vs ~0.15 with the standard PyTorch implementation), with support for batched concurrent requests and a FastAPI HTTP server. See the [Nano-vLLM-VoxCPM repo](https://github.com/a710128/nanovllm-voxcpm) for deployment details.
> **RTF as low as ~0.13 on NVIDIA RTX 4090** (vs ~0.3 with the standard PyTorch implementation), with support for batched concurrent requests and a FastAPI HTTP server. See the [Nano-vLLM-VoxCPM repo](https://github.com/a710128/nanovllm-voxcpm) for deployment details.
> **Full parameter reference, multi-scenario examples, and voice cloning tips →** [Quick Start Guide](https://voxcpm.readthedocs.io/en/dev_2.0/quickstart.html) | [Usage Guide & Best Practices](https://voxcpm.readthedocs.io/en/dev_2.0/chefsguide.html)
> **Full parameter reference, multi-scenario examples, and voice cloning tips →** [Quick Start Guide](https://voxcpm.readthedocs.io/en/latest/quickstart.html) | [Usage Guide & Best Practices](https://voxcpm.readthedocs.io/en/latest/cookbook.html)
---
@@ -247,7 +253,7 @@ VoxCPM2 is built on a **tokenizer-free, diffusion autoregressive** paradigm. The
<img src="assets/voxcpm_model.png" alt="VoxCPM2 Model Architecture" width="90%">
</div>
> For full architectural details, VoxCPM2-specific upgrades, and a model comparison table, see the [Architecture & Design Docs](https://voxcpm.readthedocs.io/en/dev_2.0/models/version_history.html).
> For full architectural details, VoxCPM2-specific upgrades, and a model comparison table, see the [Architecture & Design Docs](https://voxcpm.readthedocs.io/en/latest/models/version_history.html).
---
@@ -324,7 +330,7 @@ VoxCPM2 achieves state-of-the-art or comparable results on public zero-shot TTS
| Hindi | 6.962 | **5.827** | | 14.640 | 19.699 |
| Indonesian | 1.237 | **1.059** | | 1.460 | 1.084 |
| Italian | 1.543 | 1.743 | **0.948** | 1.270 | 1.563 |
| Japanese | 3.519 | 10.646 | 3.823 | **2.760** | 4.083 |
| Japanese | 3.519 | 10.646 | 3.823 | **2.760** | 4.628 |
| Korean | 1.747 | 1.865 | 1.755 | **1.180** | 1.962 |
| Polish | 1.415 | **0.766** | | 1.260 | 1.141 |
| Portuguese | 1.877 | 1.331 | 1.526 | **1.140** | 1.938 |
@@ -416,21 +422,21 @@ python scripts/train_voxcpm_finetune.py \
python lora_ft_webui.py # then open http://localhost:7860
```
> **Full guide →** [Fine-tuning Guide](https://voxcpm.readthedocs.io/en/dev_2.0/finetuning/finetune.html) (data preparation, configuration, training, LoRA hot-swapping, FAQ)
> **Full guide →** [Fine-tuning Guide](https://voxcpm.readthedocs.io/en/latest/finetuning/finetune.html) (data preparation, configuration, training, LoRA hot-swapping, FAQ)
---
## 📚 Documentation
Full documentation: **[voxcpm.readthedocs.io](https://voxcpm.readthedocs.io/en/dev_2.0/)**
Full documentation: **[voxcpm.readthedocs.io](https://voxcpm.readthedocs.io/en/latest/)**
| Topic | Link |
|---|---|
| Quick Start & Installation | [Quick Start](https://voxcpm.readthedocs.io/en/dev_2.0/quickstart.html) |
| Usage Guide & Cookbook | [User Guide](https://voxcpm.readthedocs.io/en/dev_2.0/usage_guide.html) |
| VoxCPM Series | [Models](https://voxcpm.readthedocs.io/en/dev_2.0/models/version_history.html) |
| Fine-tuning (SFT & LoRA) | [Fine-tuning Guide](https://voxcpm.readthedocs.io/en/dev_2.0/finetuning/finetune.html) |
| FAQ & Troubleshooting | [FAQ](https://voxcpm.readthedocs.io/en/dev_2.0/faq.html) |
| Quick Start & Installation | [Quick Start](https://voxcpm.readthedocs.io/en/latest/quickstart.html) |
| Usage Guide & Cookbook | [User Guide](https://voxcpm.readthedocs.io/en/latest/usage_guide.html) |
| VoxCPM Series | [Models](https://voxcpm.readthedocs.io/en/latest/models/version_history.html) |
| Fine-tuning (SFT & LoRA) | [Fine-tuning Guide](https://voxcpm.readthedocs.io/en/latest/finetuning/finetune.html) |
| FAQ & Troubleshooting | [FAQ](https://voxcpm.readthedocs.io/en/latest/faq.html) |
---
@@ -447,7 +453,7 @@ Full documentation: **[voxcpm.readthedocs.io](https://voxcpm.readthedocs.io/en/d
| [**ComfyUI-VoxCPMTTS**](https://github.com/1038lab/ComfyUI-VoxCPMTTS) | ComfyUI TTS extension |
| [**TTS WebUI**](https://github.com/rsxdalv/tts_webui_extension.vox_cpm) | Browser-based TTS extension |
> See the full [Ecosystem](https://voxcpm.readthedocs.io/en/dev_2.0/) in the docs. Community projects are not officially maintained by OpenBMB. Built something cool? [Open an issue or PR](https://github.com/OpenBMB/VoxCPM/issues) to add it!
> See the full [Ecosystem](https://voxcpm.readthedocs.io/en/latest/) in the docs. Community projects are not officially maintained by OpenBMB. Built something cool? [Open an issue or PR](https://github.com/OpenBMB/VoxCPM/issues) to add it!
---