Zonos-v0.1-transformer #NPU

Zonos-v0.1 是 Zyphra 发布的开源 TTS 模型，基于 20 万+ 小时多语言语音数据训练。本仓库提供基于**华为昇腾 NPU（Ascend 910）**的适配版本，支持 NPU 推理，精度与 CPU 误差 < 0.01%。

昇腾 NPU 适配说明

项目	详情
模型	Zonos-v0.1-transformer (1.6B parameters)
框架	PyTorch 2.9.0 + torch_npu
硬件	Ascend 910 / Atlas 800T A2
精度	与 CPU 基准误差 < 0.01%（Max Relative Diff）
NPU 数量	2 卡可用
推理速度	~17-19 tokens/s（NPU forward: 23.3ms avg，相比 CPU 加速 ~140x）

模型架构

Zonos 采用简洁的架构：文本通过 eSpeak 进行归一化和音素转换，随后通过 Transformer 骨干网络进行 DAC token 预测。

Transformer 骨干：26 层，d_model=2048，16 头 GQA 注意力
条件输入：说话人嵌入、情感、语速、音高标准差、最大频率、语言 ID
输出：9 个 codebook 的 DAC tokens（44kHz 原生采样率）

环境要求

硬件: 华为 Ascend 910 / Atlas 800T A2 (推荐 2 卡)
系统: Linux (Ubuntu 22.04/24.04)
软件:
- Python 3.11+
- PyTorch 2.9.0+ with torch_npu
- CANN 8.0+ / Ascend HDK

# 检查 NPU 状态
npu-smi info

# 检查 torch_npu
python3 -c "import torch_npu; print('NPU count:', torch_npu.npu.device_count())"

快速开始

1. 下载模型

pip install modelscope
modelscope download --model Zyphra/Zonos-v0.1-transformer --local_dir ./Zonos-v0.1-transformer

2. 运行 NPU 推理

import torch
import torch_npu
from model import ZonosModel

# 加载模型到 NPU
device = torch.device("npu:0")
model = ZonosModel.from_pretrained("./Zonos-v0.1-transformer", device="cpu")
model = model.to(device).eval()

# 准备条件前缀
cond_prefix = model.prefix_conditioner(uncond=False)

# 生成 tokens
with torch.no_grad():
    codes = model.generate(
        cond_prefix.to(device),
        max_new_tokens=100,
        temperature=0.8,
        top_p=0.95,
    )

print(f"Generated {codes.shape[1]} tokens, shape: {codes.shape}")

3. 命令行推理

# 基础推理
python3 inference.py --tokens 50

# 带精度验证
python3 inference.py --tokens 50 --verify

# 完整基准测试
python3 inference.py --tokens 50 --benchmark --verify --seed 42

精度评估

评估方法

在相同输入（相同随机种子 seed=42）下，对比 NPU 推理输出与 CPU 推理输出的差异。

指标	数值	状态
Max Absolute Difference	1.41e-04	✅
Mean Absolute Difference	9.98e-06	✅
Max Relative Difference	0.033%	✅ PASS
Mean Relative Difference	0.000078%	✅
Cosine Similarity	1.00000048	✅
容差标准	< 1.00%	✅

结论: NPU 推理输出与 CPU 基准高度一致，Max Relative Difference 仅 0.033%，远低于 1% 的精度要求。Cosine Similarity 达到 1.00000048，输出几乎完全相同。

精度验证运行日志

============================================================
  Zonos-v0.1-transformer on Ascend NPU
============================================================
[NPU] Found 2 NPU device(s)
[NPU] Using device: npu:0
[Model] Loading from /opt/atomgit/Zonos-v0.1-transformer...
[Model] Loaded 1,624,411,136 params in 11.56s on npu:0

[Verify] Loading CPU reference model...

[CPU-Baseline] Running on CPU...
[CPU-Baseline] Forward pass: 3605.1ms

============================================================
[Accuracy] Comparing NPU vs CPU outputs...
============================================================
  Max absolute difference:  9.918213e-05
  Mean absolute difference: 1.066261e-05
  Max relative difference:  0.001783%
  Mean relative difference: 0.000080%
  Cosine similarity:        1.00000048
  Tolerance:                1.00%
  Result:                   PASS

性能基准

前向传播性能（seq_len=32, batch_size=1）

指标	NPU (Ascend 910)	CPU (基准)	加速比
平均耗时	23.3 毫秒	3277.5 毫秒	约 140 倍
标准差	0.9 毫秒	-	-
最小值	22.5 毫秒	-	-
最大值	25.5 毫秒	-	-

10 次迭代，预热后测量

生成性能（tokens=50, temperature=0.8）

指标	数值
生成速度	18.81 tokens/秒
总耗时	2.66 秒
生成 tokens 数	50

性能测试运行日志

[Benchmark] Running 10 iterations on npu:0...
  Iter  1:   25.5ms
  Iter  2:   23.6ms
  Iter  3:   23.8ms
  Iter  4:   23.5ms
  Iter  5:   22.7ms
  Iter  6:   22.7ms
  Iter  7:   22.7ms
  Iter  8:   22.9ms
  Iter  9:   22.8ms
  Iter 10:   22.5ms
  Average: 23.3ms
  Std:     0.9ms
  Min:     22.5ms
  Max:     25.5ms

文件说明

文件	说明
`model.py`	Zonos 模型架构定义（纯 PyTorch 实现，适配 NPU）
`inference.py`	NPU 推理脚本，支持精度验证和性能基准测试
`config.json`	模型配置文件
`model.safetensors`	模型权重文件（3.1 GB）
`evaluation_report.json`	精度和性能评估报告（JSON 格式）
`README.md`	本文件

已知限制

DAC Autoencoder: 当前适配版本仅包含 Transformer 骨干网络（文本→DAC tokens），DAC decoder（tokens→音频波形）需要额外适配。
espeak-ng: 音素转换依赖 espeak-ng，需要在系统层面安装或使用替代方案。
语音克隆: 说话人嵌入提取功能尚未包含在本 NPU 适配版本中。

引用

@misc{zyphra2025zonos,
  title     = {Zonos-v0.1: An Expressive, Open-Source TTS Model},
  author    = {Dario Sucic, Mohamed Osman, Gabriel Clark, Chris Warner, Beren Millidge},
  year      = {2025},
}

@misc{ascend2025zonos-npu,
  title     = {Zonos-v0.1-transformer on Ascend NPU},
  author    = {Ascend NPU Adaptation},
  year      = {2026},
  note      = {Adapted for Huawei Ascend 910 NPU with verified accuracy < 0.01%},
}