sherpa（gomodels/sherpa）- 昇腾NPU适配

模型概述

本仓库包含gomodels/sherpa的昇腾NPU适配版本，集成了两个最先进的语音模型：

模型	任务	架构	后端
SenseVoice	语音识别（ASR）	CTC Transformer	ONNX Runtime + torch_npu
Matcha-TTS	文本转语音（TTS）	Flow Matching + Vocos	ONNX Runtime + torch_npu

支持语言（ASR）

中文（zh）- 普通话
英语（en）
日语（ja）
韩语（ko）
粤语（yue）

原始模型

SenseVoice：FunAudioLLM/SenseVoice
Matcha-TTS：shivammehta25/Matcha-TTS
ModelScope：gomodels/sherpa

NPU适配详情

硬件

平台：华为昇腾910 NPU
CANN版本：8.5.1
torch_npu：2.9.0.post1

适配策略

NPU适配采用混合加速架构：

核心模型推理：ONNX Runtime（优化的CPU后端）
NPU加速的预处理/后处理：基于昇腾910的torch_npu用于：
- 音频归一化和特征预处理
- 推理后音频增强和归一化
- 张量运算卸载到NPU以降低CPU负载

精度验证

任务	指标	结果	阈值	状态
ASR（SenseVoice）	文本匹配	100%匹配	完全匹配	通过
TTS（Matcha-TTS）	NRMSE	0.0000%	< 1%	通过
TTS（Matcha-TTS）	相关性	1.0000	> 0.99	通过

性能基准测试

任务	设备	平均时间（秒）	RTF	速度（实时倍数）
ASR（5.6秒音频）	CPU	0.288	0.052	19.4倍
ASR（5.6秒音频）	NPU	0.275	0.049	20.3倍
TTS（4.1秒输出）	CPU	0.260	0.064	15.7倍
TTS（4.1秒输出）	NPU	0.274	0.067	14.9倍

快速入门

环境设置

# Install dependencies
pip install sherpa-onnx onnxruntime onnx soundfile numpy librosa

# Ascend NPU dependencies (pre-installed on Ascend servers)
# torch_npu, CANN toolkit

下载模型

pip install modelscope
modelscope download --model gomodels/sherpa

推理

语音识别（ASR）：

# CPU mode
python inference.py --task asr --input audio.wav --device cpu

# NPU mode
python inference.py --task asr --input audio.wav --device npu

# Test with included sample
python inference.py --task asr \
  --input ~/.cache/modelscope/hub/models/gomodels/sherpa/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav \
  --device npu

文本转语音（TTS）：

# CPU mode
python inference.py --task tts --input "你好世界" --output output.wav --device cpu

# NPU mode
python inference.py --task tts --input "你好世界" --output output.wav --device npu

准确性验证：

python inference.py --task validate

性能基准测试：

# CPU + NPU comparison
python benchmark.py --task all --runs 5 --device both

文件结构

sherpa-npu/
├── inference.py        # Main inference script (ASR + TTS, CPU + NPU)
├── npu_tts.py          # NPU-accelerated TTS with PyTorch weights
├── benchmark.py        # Performance benchmarking script
├── README.md           # This documentation
├── requirements.txt    # Python dependencies
└── results/            # Evaluation results
    ├── accuracy_validation.json
    └── benchmark.json

评估结果

准确率

ASR 转录在 CPU 和 NPU 模式下完全一致
TTS 音频输出匹配，NRMSE < 0.0001%（效果上完全相同）
两个模型均保持了原始 ONNX Runtime 的推理准确率

性能特征

ASR 和 TTS 的实时因子（RTF）均稳定低于 0.07
NPU 模式使 ASR 预处理吞吐量提升约 5%
CPU 和 NPU 模式均实现 15-20 倍的实时处理速度

许可证

Apache License 2.0

引用

sherpa（gomodels/sherpa）- 昇腾NPU适配

模型概述

本仓库包含gomodels/sherpa的昇腾NPU适配版本，集成了两个最先进的语音模型：

模型	任务	架构	后端
SenseVoice	语音识别（ASR）	CTC Transformer	ONNX Runtime + torch_npu
Matcha-TTS	文本转语音（TTS）	Flow Matching + Vocos	ONNX Runtime + torch_npu

支持语言（ASR）

中文（zh）- 普通话
英语（en）
日语（ja）
韩语（ko）
粤语（yue）

原始模型

SenseVoice：FunAudioLLM/SenseVoice
Matcha-TTS：shivammehta25/Matcha-TTS
ModelScope：gomodels/sherpa

NPU适配详情

硬件

平台：华为昇腾910 NPU
CANN版本：8.5.1
torch_npu：2.9.0.post1

适配策略

NPU适配采用混合加速架构：

核心模型推理：ONNX Runtime（优化的CPU后端）
NPU加速的预处理/后处理：基于昇腾910的torch_npu用于：
- 音频归一化和特征预处理
- 推理后音频增强和归一化
- 张量运算卸载到NPU以降低CPU负载

精度验证

任务	指标	结果	阈值	状态
ASR（SenseVoice）	文本匹配	100%匹配	完全匹配	通过
TTS（Matcha-TTS）	NRMSE	0.0000%	< 1%	通过
TTS（Matcha-TTS）	相关性	1.0000	> 0.99	通过

性能基准测试

任务	设备	平均时间（秒）	RTF	速度（实时倍数）
ASR（5.6秒音频）	CPU	0.288	0.052	19.4倍
ASR（5.6秒音频）	NPU	0.275	0.049	20.3倍
TTS（4.1秒输出）	CPU	0.260	0.064	15.7倍
TTS（4.1秒输出）	NPU	0.274	0.067	14.9倍

快速入门

环境设置

# Install dependencies
pip install sherpa-onnx onnxruntime onnx soundfile numpy librosa

# Ascend NPU dependencies (pre-installed on Ascend servers)
# torch_npu, CANN toolkit

下载模型

pip install modelscope
modelscope download --model gomodels/sherpa

推理

语音识别（ASR）：

# CPU mode
python inference.py --task asr --input audio.wav --device cpu

# NPU mode
python inference.py --task asr --input audio.wav --device npu

# Test with included sample
python inference.py --task asr \
  --input ~/.cache/modelscope/hub/models/gomodels/sherpa/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/test_wavs/zh.wav \
  --device npu

文本转语音（TTS）：

# CPU mode
python inference.py --task tts --input "你好世界" --output output.wav --device cpu

# NPU mode
python inference.py --task tts --input "你好世界" --output output.wav --device npu

准确性验证：

python inference.py --task validate

性能基准测试：

# CPU + NPU comparison
python benchmark.py --task all --runs 5 --device both

文件结构

sherpa-npu/
├── inference.py        # Main inference script (ASR + TTS, CPU + NPU)
├── npu_tts.py          # NPU-accelerated TTS with PyTorch weights
├── benchmark.py        # Performance benchmarking script
├── README.md           # This documentation
├── requirements.txt    # Python dependencies
└── results/            # Evaluation results
    ├── accuracy_validation.json
    └── benchmark.json