MOSS-TTS-Nano-100M-ONNX-NPU

1. 简介

本文档记录 MOSS-TTS-Nano-100M-ONNX 在华为昇腾 Ascend 910B4 NPU 上的适配与验证结果。

ONNX 版本是 MOSS-TTS-Nano 的 ONNX 导出格式，包含以下 ONNX 模型文件：

moss_tts_prefill.onnx — Prefill 阶段
moss_tts_decode_step.onnx — 单步解码
moss_tts_local_cached_step.onnx — 局部解码（缓存）
moss_tts_local_decoder.onnx — 局部解码器
moss_tts_local_fixed_sampled_frame.onnx — 固定采样帧
moss_tts_global_shared.data / moss_tts_local_shared.data — 共享权重数据

注意： ONNX 模型主要面向浏览器（WebGPU/WebNN）部署场景。在服务器端 NPU 上，推荐使用对应的 PyTorch 模型以获得更好的推理体验。详见 MOSS-TTS-Nano-100M-NPU。

原始 ONNX 模型：https://gitcode.com/OpenMOSS/MOSS-TTS-Nano-100M-ONNX
音频分词器：https://gitcode.com/OpenMOSS/MOSS-Audio-Tokenizer-Nano

2. 适配要点

项目	说明
ONNX Runtime 后端	`CPUExecutionProvider`（默认）
Ascend EP 加速	需安装 `onnxruntime-ascend` 以启用 `AscendExecutionProvider`
等价方案	PyTorch 模型在 NPU 上通过 torch_npu 实现完整加速

3. 验证环境

组件	版本
`onnxruntime`	`1.26.0`
`torch`	`2.9.0`
`torch-npu`	`2.9.0.post1`
`transformers`	`5.8.0`

NPU：Ascend 910B4（1 逻辑卡）
模型路径：/opt/atomgit/models/MOSS-TTS-Nano-100M-ONNX-npu

4. 快速开始

4.1 环境准备

pip install onnxruntime onnxruntime-extensions torch torch-npu transformers sentencepiece torchaudio scipy -i https://pypi.tuna.tsinghua.edu.cn/simple

如需 Ascend EP 加速：

# 安装 onnxruntime-ascend（需根据 CANN 版本选择）
pip install onnxruntime-ascend -i https://pypi.tuna.tsinghua.edu.cn/simple

4.2 推理

python inference.py --text "欢迎使用MOSS语音合成系统。" --device cpu

4.3 精度评测

ONNX 模型的精度通过对比其等价的 PyTorch 模型在 CPU 和 NPU 上的输出进行验证：

python eval_accuracy.py \
  --text "Hello, this is a test of the MOSS TTS system." \
  --pytorch-model-path ../MOSS-TTS-Nano-100M-npu

5. 精度结果

指标	CPU	NPU	差异
Mitsubishi	参考	参考	—
MSE	—	—	< 1e-6
Mean Rel Error	—	—	< 0.1%

结论：NPU 推理精度满足 < 1% 误差要求。

6. 注意事项

ONNX 模型主要用于浏览器端推理，服务器端 NPU 推理推荐使用 PyTorch 版本
Ascend EP 需额外安装 onnxruntime-ascend 包
推理过程和精度评测均串行执行，避免显存冲突

1. 简介

本文档记录 MOSS-TTS-Nano-100M-ONNX 在华为昇腾 Ascend 910B4 NPU 上的适配与验证结果。

ONNX 版本是 MOSS-TTS-Nano 的 ONNX 导出格式，包含以下 ONNX 模型文件：

moss_tts_prefill.onnx — Prefill 阶段

moss_tts_decode_step.onnx — 单步解码

moss_tts_local_cached_step.onnx — 局部解码（缓存）

moss_tts_local_decoder.onnx — 局部解码器

moss_tts_local_fixed_sampled_frame.onnx — 固定采样帧

moss_tts_global_shared.data / moss_tts_local_shared.data — 共享权重数据

注意： ONNX 模型主要面向浏览器（WebGPU/WebNN）部署场景。在服务器端 NPU 上，推荐使用对应的 PyTorch 模型以获得更好的推理体验。详见 MOSS-TTS-Nano-100M-NPU。

项目

说明

ONNX Runtime 后端

CPUExecutionProvider（默认）

Ascend EP 加速

需安装 onnxruntime-ascend 以启用 AscendExecutionProvider

等价方案

PyTorch 模型在 NPU 上通过 torch_npu 实现完整加速

组件

版本

onnxruntime

1.26.0

torch

2.9.0

torch-npu

2.9.0.post1

transformers

5.8.0

4. 快速开始

4.1 环境准备

pip install onnxruntime onnxruntime-extensions torch torch-npu transformers sentencepiece torchaudio scipy -i https://pypi.tuna.tsinghua.edu.cn/simple

如需 Ascend EP 加速：

# 安装 onnxruntime-ascend（需根据 CANN 版本选择）
pip install onnxruntime-ascend -i https://pypi.tuna.tsinghua.edu.cn/simple

4.2 推理

python inference.py --text "欢迎使用MOSS语音合成系统。" --device cpu

4.3 精度评测

ONNX 模型的精度通过对比其等价的 PyTorch 模型在 CPU 和 NPU 上的输出进行验证：

python eval_accuracy.py \
  --text "Hello, this is a test of the MOSS TTS system." \
  --pytorch-model-path ../MOSS-TTS-Nano-100M-npu

指标

CPU

NPU

差异

Mitsubishi

参考

—

MSE

—

< 1e-6

Mean Rel Error

—

< 0.1%