diff --git a/README.md b/README.md
index 502c741..104b7b2 100644
--- a/README.md
+++ b/README.md
@@ -6,6 +6,8 @@
+
+
@@ -40,7 +42,7 @@ VoxCPM is a **tokenizer-free** Text-to-Speech system that directly generates con
- ποΈ **Ultimate Cloning** β Reproduce every vocal nuance: provide both reference audio and its transcript, and the model continues seamlessly from the reference, faithfully preserving every vocal detail β timbre, rhythm, emotion, and style (same as VoxCPM1.5)
- π **48kHz High-Quality Audio** β Accepts 16kHz reference audio and directly outputs 48kHz studio-quality audio via AudioVAE V2's asymmetric encode/decode design, with built-in super-resolution β no external upsampler needed
- π§ **Context-Aware Synthesis** β Automatically infers appropriate prosody and expressiveness from text content
-- β‘ **Real-Time Streaming** β RTF as low as ~0.13 on NVIDIA RTX 4090 by [Nano-VLLM](https://github.com/huggingface/nano-vllm)
+- β‘ **Real-Time Streaming** β RTF as low as ~0.3 on NVIDIA RTX 4090, and ~0.13 accelerated by [Nano-VLLM](https://github.com/a710128/nanovllm-voxcpm)
- π **Fully Open-Source & Commercial-Ready** β Weights and code released under the [Apache-2.0](LICENSE) license, free for commercial use
@@ -53,10 +55,10 @@ Chinese Dialect: εε·θ―, η²€θ―, ε΄θ―, δΈεθ―, ζ²³εθ―, ιθ₯Ώθ―, ε±±
### News
-* **[2026.04]** π₯ We release **VoxCPM2** β 2B, 30 languages, Voice Design & Controllable Voice Cloning, 48kHz audio output! [Weights](https://huggingface.co/openbmb/VoxCPM2) | [Docs](https://voxcpm.readthedocs.io/en/latest/)
+* **[2026.04]** π₯ We release **VoxCPM2** β 2B, 30 languages, Voice Design & Controllable Voice Cloning, 48kHz audio output! [Weights](https://huggingface.co/openbmb/VoxCPM2) | [Docs](https://voxcpm.readthedocs.io/en/latest/) | [Playground](https://huggingface.co/spaces/OpenBMB/VoxCPM-Demo)
* **[2025.12]** π Open-source **VoxCPM1.5** [weights](https://huggingface.co/openbmb/VoxCPM1.5) with SFT & LoRA fine-tuning. (**π #1 GitHub Trending**)
* **[2025.09]** π₯ Release VoxCPM [Technical Report](https://arxiv.org/abs/2509.24650).
-* **[2025.09]** π Open-source **VoxCPM-0.5B** [weights](https://huggingface.co/openbmb/VoxCPM-0.5B) & [Playground](https://huggingface.co/spaces/OpenBMB/VoxCPM-Demo). (**π #1 HuggingFace Trending**)
+* **[2025.09]** π Open-source **VoxCPM-0.5B** [weights](https://huggingface.co/openbmb/VoxCPM-0.5B) (**π #1 HuggingFace Trending**)
---
@@ -181,7 +183,7 @@ voxcpm design \
--text "VoxCPM2 brings studio-quality multilingual speech synthesis." \
--output out.wav
-# Voice design with style control
+# Controllable voice cloning with style control
voxcpm design \
--text "VoxCPM2 brings studio-quality multilingual speech synthesis." \
--control "Young female voice, warm and gentle, slightly smiling" \
@@ -233,7 +235,7 @@ server.stop()
> **RTF as low as ~0.13 on NVIDIA RTX 4090** (vs ~0.3 with the standard PyTorch implementation), with support for batched concurrent requests and a FastAPI HTTP server. See the [Nano-vLLM-VoxCPM repo](https://github.com/a710128/nanovllm-voxcpm) for deployment details.
-> **Full parameter reference, multi-scenario examples, and voice cloning tips β** [Quick Start Guide](https://voxcpm.readthedocs.io/en/latest/quickstart.html) | [Usage Guide & Best Practices](https://voxcpm.readthedocs.io/en/latest/cookbook.html)
+> **Full parameter reference, multi-scenario examples, and voice cloning tips β** [Quick Start Guide](https://voxcpm.readthedocs.io/en/latest/quickstart.html) | [Usage Guide](https://voxcpm.readthedocs.io/en/latest/usage_guide.html) | [Cookbook](https://voxcpm.readthedocs.io/en/latest/cookbook.html)
---
@@ -246,15 +248,15 @@ server.stop()
| **Audio Sample Rate** | 48kHz | 44.1kHz | 16kHz |
| **LM Token Rate** | 6.25Hz | 6.25Hz | 12.5Hz |
| **Languages** | 30 | 2 (zh, en) | 2 (zh, en) |
+| **Cloning Mode** | Isolated Reference & Continuation | Continuation only | Continuation only |
| **Voice Design** | β
| β | β |
-| **Style Control** | β
| β | β |
-| **Reference Cloning** | Isolated Reference & Continuation | Continuation only | Continuation only |
+| **Controllable Voice Cloning** | β
| β | β |
| **SFT / LoRA** | β
| β
| β
|
| **RTF (RTX 4090)** | ~0.30 | ~0.15 | ~0.17 |
| **RTF in Nano-VLLM (RTX 4090)** | ~0.13 | ~0.08 | ~0.10 |
| **VRAM** | ~8 GB | ~6 GB | ~5 GB |
| **Weights** | [π€ HF](https://huggingface.co/openbmb/VoxCPM2) / [MS](https://modelscope.cn/models/OpenBMB/VoxCPM2) | [π€ HF](https://huggingface.co/openbmb/VoxCPM1.5) / [MS](https://modelscope.cn/models/OpenBMB/VoxCPM1.5) | [π€ HF](https://huggingface.co/openbmb/VoxCPM-0.5B) / [MS](https://modelscope.cn/models/OpenBMB/VoxCPM-0.5B) |
-| **Technical Report** | Coming soon | β | [arXiv](https://arxiv.org/abs/2509.24650) |
+| **Technical Report** | Coming soon | β | [arXiv](https://arxiv.org/abs/2509.24650) [ICLR 2026](https://openreview.net/forum?id=h5KLpGoqzC) |
| **Demo Page** | [Audio Samples](https://openbmb.github.io/voxcpm2-demopage) | β | [Audio Samples](https://openbmb.github.io/VoxCPM-demopage) |
VoxCPM2 is built on a **tokenizer-free, diffusion autoregressive** paradigm. The model operates entirely in the latent space of **AudioVAE V2**, following a four-stage pipeline: **LocEnc β TSLM β RALM β LocDiT**, enabling rich expressiveness and 48kHz native audio output.
@@ -263,7 +265,7 @@ VoxCPM2 is built on a **tokenizer-free, diffusion autoregressive** paradigm. The
-> For full architectural details, VoxCPM2-specific upgrades, and a model comparison table, see the [Architecture & Design Docs](https://voxcpm.readthedocs.io/en/latest/models/version_history.html).
+> For full architectural details, VoxCPM2-specific upgrades, and a model comparison table, see the [Architecture Design](https://voxcpm.readthedocs.io/en/latest/models/architecture.html).
---
@@ -470,7 +472,7 @@ Full documentation: **[voxcpm.readthedocs.io](https://voxcpm.readthedocs.io/en/l
## β οΈ Risks and Limitations
- **Potential for Misuse:** VoxCPM's voice cloning can generate highly realistic synthetic speech. It is **strictly forbidden** to use VoxCPM for impersonation, fraud, or disinformation. We strongly recommend clearly marking any AI-generated content.
-- **Controllable Generation Stability:** Voice Design and Style Control results can vary between runs β you may try to generate 1~3 times to obtain the desired voice or style. We are actively working on improving controllability consistency.
+- **Controllable Generation Stability:** Voice Design and Controllable Voice Cloning results can vary between runs β you may try to generate 1~3 times to obtain the desired voice or style. We are actively working on improving controllability consistency.
- **Language Coverage:** VoxCPM2 officially supports 30 languages. For languages not on the list, you are welcome to test directly or try fine-tuning on your own data. We plan to expand language coverage in future releases.
- **Usage:** This model is released under the Apache-2.0 license. For production deployments, we recommend conducting thorough testing and safety evaluation tailored to your use case.