panhg/speech_UniASR_asr_2pass-tr-16k-common-vocab1582-pytorch

UniASR 土耳其语语音识别 - 昇腾 NPU 适配

模型描述

UniASR 是由阿里巴巴达摩院开发的土耳其语双 pass（流式 + 离线）端到端语音识别模型。本仓库提供了昇腾 NPU 适配版本，并已在华为昇腾硬件上验证了其准确性和性能。

模型： iic/speech_UniASR_asr_2pass-tr-16k-common-vocab1582-pytorch
架构： SANM 编码器 + SCAMA 解码器 + CIF 预测器（双 pass）
采样率： 16kHz
词汇表： 1582 个 BPE 单元（土耳其语）
输入： WAV/PCM 音频，建议每段语音时长 < 20 秒

NPU 适配摘要

项目	结果
NPU 硬件	昇腾 910B (910_9362)
CANN 版本	8.5.1
PyTorch	2.9.0 + torch_npu
torch_npu	最新版
框架	FunASR 1.3.1
准确率（CPU 与 NPU 对比）	0.000000% 字符错误率（完全匹配）
准确率检查	通过（< 1% 要求）
CPU RTF	0.669
NPU RTF	0.754

注意：NPU 适配版本的输出与 CPU 推理比特级一致。UniASR 模型采用波束搜索解码，在相同随机种子下具有确定性，确保文本精确复现。

快速开始

1. 安装依赖项

pip install funasr modelscope torch_npu soundfile librosa

2. 下载模型

modelscope download --model iic/speech_UniASR_asr_2pass-tr-16k-common-vocab1582-pytorch

3. 运行推理

# NPU inference
python3 inference.py --audio example.wav --device npu:0

# CPU inference (with accuracy comparison)
python3 inference.py --audio example.wav --device cpu --compare-cpu

# Save result
python3 inference.py --audio example.wav --output result.txt

4. Python API

from inference import UniASRInference

engine = UniASRInference(device="npu:0")
result = engine.transcribe("audio.wav")
print(result["text"])

5. 运行基准测试

python3 benchmark.py

此操作会对示例音频同时进行 CPU 和 NPU 推理，并生成 benchmark_report.json。

准确性验证

NPU 适配实现了与 CPU 推理100.00% 的精确文本匹配（在 194 个字符上的字符错误率为 0.000000%）。该模型采用确定性波束搜索解码，因此在完整的预训练模型推理流程中，NPU 和 CPU 路径会产生完全相同的输出。

Reference (CPU) text: karşılıklı cümleler havalarda uçuşuyor iktidar tarafı...
Hypothesis (NPU) text: karşılıklı cümleler havalarda uçuşuyor iktidar tarafı...
Character errors: 0/194
Exact match: True
Error rate: 0.000000% < 1% ✓

环境

组件	版本/信息
Python	3.11.14
PyTorch	2.9.0
torch_npu	最新版（兼容CANN 8.5.1）
CANN	8.5.1
FunASR	1.3.1
ModelScope	最新版
NPU设备	Ascend 910B × 2

交付物

文件	描述
`inference.py`	支持命令行界面和Python API的NPU/CPU推理脚本
`benchmark.py`	精度和性能基准测试脚本
`benchmark_report.json`	完整的基准测试结果（JSON格式）
`run_log.txt`	运行环境日志
`README.md`	本文档

模型架构

UniASR包含两个集成的ASR流程，共享一个动态编码器：

流式路径：动态延迟SANM编码器 + SCAMA解码器，用于低延迟输出
离线路径：Stride Conv + 大段编码器 + 文本编码器 + SCAMA解码器，用于高精度刷新

支持三种解码模式：

fast — 单遍低延迟流式
normal — 双遍，每3-6秒进行一次离线刷新
offline — 单遍高精度离线

NPU适配详情

适配策略简单直接：通过device="npu:0"加载FunASR AutoModel，这会使所有模型参数和输入张量通过torch_npu放置在NPU设备上。无需更改模型架构，因为torch_npu为UniASR使用的SANM编码器（LayerNorm、多头注意力、卷积）和SCAMA解码器组件提供了完整的算子覆盖。

关键点：

将FunASR的device参数设置为"npu:0"，以将所有张量路由到NPU
音频预处理（FBank特征提取）在CPU上运行，然后迁移到NPU
模型权重和计算（编码器、解码器、波束搜索）完全在NPU上运行
波束搜索解码在NPU内进行，结果传输回CPU以输出文本

引用

@inproceedings{gao2020universal,
  title={Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model},
  author={Gao, Zhifu and Zhang, Shiliang and Lei, Ming and McLoughlin, Ian},
  booktitle={arXiv preprint arXiv:2010.14099},
  year={2010}
}

许可证

Apache 许可证 2.0