docs: add vLLM-Omni serving references
Document vLLM-Omni as a production serving option for VoxCPM2 alongside the existing Nano-vLLM reference. Mirrors the addition in README_zh.md, and adds an ecosystem table entry. Install snippet follows the upstream vLLM-Omni installation guide (from source, since vllm-omni is rapidly evolving). Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
This commit is contained in:
@@ -46,7 +46,7 @@ VoxCPM is a **tokenizer-free** Text-to-Speech system that directly generates con
|
||||
- 🎙️ **Ultimate Cloning** — Reproduce every vocal nuance: provide both reference audio and its transcript, and the model continues seamlessly from the reference, faithfully preserving every vocal detail — timbre, rhythm, emotion, and style (same as VoxCPM1.5)
|
||||
- 🔊 **48kHz High-Quality Audio** — Accepts 16kHz reference audio and directly outputs 48kHz studio-quality audio via AudioVAE V2's asymmetric encode/decode design, with built-in super-resolution — no external upsampler needed
|
||||
- 🧠 **Context-Aware Synthesis** — Automatically infers appropriate prosody and expressiveness from text content
|
||||
- ⚡ **Real-Time Streaming** — RTF as low as ~0.3 on NVIDIA RTX 4090, and ~0.13 accelerated by [Nano-VLLM](https://github.com/a710128/nanovllm-voxcpm)
|
||||
- ⚡ **Real-Time Streaming** — RTF as low as ~0.3 on NVIDIA RTX 4090, and ~0.13 accelerated by [Nano-vLLM](https://github.com/a710128/nanovllm-voxcpm) or [vLLM-Omni](https://github.com/vllm-project/vllm-omni) — official vLLM omni-modal serving for VoxCPM2 with PagedAttention and an OpenAI-compatible API
|
||||
- 📜 **Fully Open-Source & Commercial-Ready** — Weights and code released under the [Apache-2.0](LICENSE) license, free for commercial use
|
||||
|
||||
|
||||
@@ -262,6 +262,32 @@ server.stop()
|
||||
|
||||
> **RTF as low as ~0.13 on NVIDIA RTX 4090** (vs ~0.3 with the standard PyTorch implementation), with support for batched concurrent requests and a FastAPI HTTP server. See the [Nano-vLLM-VoxCPM repo](https://github.com/a710128/nanovllm-voxcpm) for deployment details.
|
||||
|
||||
### 🏭 Production Serving (vLLM-Omni)
|
||||
|
||||
For production multi-tenant deployments, use [**vLLM-Omni**](https://github.com/vllm-project/vllm-omni) — the official vLLM project's omni-modal extension with native **VoxCPM2** support. PagedAttention KV cache, continuous batching, and a drop-in **OpenAI-compatible** `/v1/audio/speech` endpoint.
|
||||
|
||||
```bash
|
||||
# Install from source (latest main — vllm-omni is rapidly evolving)
|
||||
uv pip install vllm==0.19.0 --torch-backend=auto
|
||||
git clone https://github.com/vllm-project/vllm-omni.git && cd vllm-omni
|
||||
uv pip install -e .
|
||||
```
|
||||
|
||||
See the [vLLM-Omni installation guide](https://vllm-omni.readthedocs.io/en/latest/getting_started/installation/) for other platforms (ROCm, XPU, MUSA, NPU) and Docker images.
|
||||
|
||||
```bash
|
||||
# Launch an OpenAI-compatible TTS server (--omni enables omni-modal serving)
|
||||
vllm serve openbmb/VoxCPM2 --omni --port 8000
|
||||
|
||||
# Call it from any OpenAI client
|
||||
curl http://localhost:8000/v1/audio/speech \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model":"openbmb/VoxCPM2","input":"Hello from VoxCPM2 on vLLM-Omni!","voice":"default"}' \
|
||||
--output out.wav
|
||||
```
|
||||
|
||||
> Built on the upstream vLLM scheduler, with batched concurrent requests, streaming chunk delivery, and multi-GPU deployment out of the box. See the [VoxCPM2 example](https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/voxcpm2) for full deployment recipes.
|
||||
|
||||
> **Full parameter reference, multi-scenario examples, and voice cloning tips →** [Quick Start Guide](https://voxcpm.readthedocs.io/en/latest/quickstart.html) | [Usage Guide](https://voxcpm.readthedocs.io/en/latest/usage_guide.html) | [Cookbook](https://voxcpm.readthedocs.io/en/latest/cookbook.html)
|
||||
|
||||
---
|
||||
@@ -528,6 +554,7 @@ Full documentation: **[voxcpm.readthedocs.io](https://voxcpm.readthedocs.io/en/l
|
||||
| Project | Description |
|
||||
|---|---|
|
||||
| [**Nano-vLLM**](https://github.com/a710128/nanovllm-voxcpm) | High-throughput and Fast GPU serving |
|
||||
| [**vLLM-Omni**](https://github.com/vllm-project/vllm-omni) | Official vLLM omni-modal serving for VoxCPM2 — PagedAttention, OpenAI-compatible API |
|
||||
| [**VoxCPM.cpp**](https://github.com/bluryar/VoxCPM.cpp) | GGML/GGUF: CPU, CUDA, Vulkan inference |
|
||||
| [**VoxCPM-ONNX**](https://github.com/bluryar/VoxCPM-ONNX) | ONNX export for CPU inference |
|
||||
| [**VoxCPMANE**](https://github.com/0seba/VoxCPMANE) | Apple Neural Engine backend |
|
||||
|
||||
Reference in New Issue
Block a user