diff --git a/README.md b/README.md index ffd0c8a..d6e6467 100644 --- a/README.md +++ b/README.md @@ -415,10 +415,54 @@ VoxCPM2 achieves state-of-the-art or comparable results on public zero-shot and + +### Internal 30-Language ASR Benchmark + +We additionally run an internal multilingual intelligibility benchmark with **30 languages × 500 samples**. ASR transcription is evaluated via **Gemini 3.1 Flash Lite API**. + +
+Internal 30-Language ASR Benchmark (click to expand) + +| Language | Metric | VoxCPM2 | Fish S2-Pro | +|---|---:|---:|---:| +| ar (Arabic) | CER | 1.23% | 0.30% | +| da (Danish) | WER | 2.70% | 3.52% | +| de (German) | WER | 0.96% | 0.64% | +| el (Greek) | WER | 3.17% | 4.61% | +| en (English) | WER | 0.42% | 1.03% | +| es (Spanish) | WER | 1.33% | 0.64% | +| fi (Finnish) | WER | 2.24% | 2.80% | +| fr (French) | WER | 2.16% | 2.34% | +| he (Hebrew) | CER | 2.98% | 15.27% | +| hi (Hindi) | CER | 0.79% | 0.91% | +| id (Indonesian) | WER | 1.36% | 1.68% | +| it (Italian) | WER | 1.65% | 1.08% | +| ja (Japanese) | CER | 2.40% | 1.82% | +| km (Khmer) | CER | 2.05% | 75.15% | +| ko (Korean) | CER | 0.95% | 0.29% | +| lo (Lao) | CER | 1.90% | 87.40% | +| ms (Malay) | WER | 1.75% | 1.41% | +| my (Burmese) | CER | 1.42% | 85.27% | +| nl (Dutch) | WER | 1.25% | 1.68% | +| no (Norwegian) | WER | 2.49% | 3.76% | +| pl (Polish) | WER | 1.90% | 1.65% | +| pt (Portuguese) | WER | 1.48% | 1.49% | +| ru (Russian) | WER | 0.90% | 0.86% | +| sv (Swedish) | WER | 2.22% | 2.63% | +| sw (Swahili) | CER | 1.07% | 2.02% | +| th (Thai) | CER | 0.94% | 1.92% | +| tl (Tagalog) | WER | 2.63% | 4.00% | +| tr (Turkish) | WER | 1.65% | 1.65% | +| vi (Vietnamese) | WER | 1.56% | 5.56% | +| zh (Chinese) | CER | 0.92% | 1.02% | +| Average (30 languages) | | **1.68%** | - | + +
+ ### InstructTTSEval
-Instruction-Guided Voice Design Results +Instruction-Guided Voice Design Results (click to expand) | Model | InstructTTSEval-ZH | | | InstructTTSEval-EN | | | |-------|:---:|:----:|:----:|:----:|:----:|:----:| diff --git a/README_zh.md b/README_zh.md index 6cc3b6b..89e1f43 100644 --- a/README_zh.md +++ b/README_zh.md @@ -414,10 +414,54 @@ VoxCPM2 在公开的零样本和可控 TTS 基准测试中取得了 SOTA 或可
+### Internal 30-Language ASR Benchmark + +我们额外进行了内部多语言可懂度评测:**30 语种 × 500 样本**,ASR 转写评估使用 **Gemini 3.1 Flash Lite API**。 + +
+内部30语种评测集ASR结果(点击展开) + +| 语言 | 指标 | VoxCPM2 | Fish S2-Pro | +|---|---:|---:|---:| +| ar (阿拉伯语) | CER | 1.23% | 0.30% | +| da (丹麦语) | WER | 2.70% | 3.52% | +| de (德语) | WER | 0.96% | 0.64% | +| el (希腊语) | WER | 3.17% | 4.61% | +| en (英语) | WER | 0.42% | 1.03% | +| es (西班牙语) | WER | 1.33% | 0.64% | +| fi (芬兰语) | WER | 2.24% | 2.80% | +| fr (法语) | WER | 2.16% | 2.34% | +| he (希伯来语) | CER | 2.98% | 15.27% | +| hi (印地语) | CER | 0.79% | 0.91% | +| id (印尼语) | WER | 1.36% | 1.68% | +| it (意大利语) | WER | 1.65% | 1.08% | +| ja (日语) | CER | 2.40% | 1.82% | +| km (高棉语) | CER | 2.05% | 75.15% | +| ko (韩语) | CER | 0.95% | 0.29% | +| lo (老挝语) | CER | 1.90% | 87.40% | +| ms (马来语) | WER | 1.75% | 1.41% | +| my (缅甸语) | CER | 1.42% | 85.27% | +| nl (荷兰语) | WER | 1.25% | 1.68% | +| no (挪威语) | WER | 2.49% | 3.76% | +| pl (波兰语) | WER | 1.90% | 1.65% | +| pt (葡萄牙语) | WER | 1.48% | 1.49% | +| ru (俄语) | WER | 0.90% | 0.86% | +| sv (瑞典语) | WER | 2.22% | 2.63% | +| sw (斯瓦希里语) | CER | 1.07% | 2.02% | +| th (泰语) | CER | 0.94% | 1.92% | +| tl (菲律宾语) | WER | 2.63% | 4.00% | +| tr (土耳其语) | WER | 1.65% | 1.65% | +| vi (越南语) | WER | 1.56% | 5.56% | +| zh (中文) | CER | 0.92% | 1.02% | +| 平均(30 语种) | | **1.68%** | - | + +
+ + ### InstructTTSEval
-指令驱动音色设计结果 +指令驱动音色设计结果 (点击展开) | Model | InstructTTSEval-ZH | | | InstructTTSEval-EN | | | |-------|:---:|:----:|:----:|:----:|:----:|:----:|