Paraformer Large AISHELL2 ASR - NPU 适配

模型信息

字段	值
模型名称	damo/speech_paraformer-large_asr_nat-zh-cn-16k-aishell2-vocab8404-pytorch
任务类型	自动语音识别（ASR）
模型架构	采用 SANM 编码器的 Paraformer（非自回归）
语言	中文（zh-cn）
采样率	16kHz
训练数据	AISHELL-2
词汇表大小	8404
来源	ModelScope / FunASR

环境

组件	版本
NPU	Ascend910（Ascend910_9362）
CANN	8.5.1
Python	3.11.14
PyTorch	2.9.0+cpu
torch_npu	2.9.0.post1
FunASR	1.3.1

模型下载

使用 ModelScope 的 snapshot_download 下载模型：

from modelscope import snapshot_download
model_dir = snapshot_download('damo/speech_paraformer-large_asr_nat-zh-cn-16k-aishell2-vocab8404-pytorch')

该模型采用旧版 ModelScope 格式（model.pb 权重）。为实现与 FunASR 的兼容性，需创建符号链接 model.pt -> model.pb。

音频预处理

输入：16kHz 单声道 WAV 音频
特征：80 维 Fbank（n_mels=80）
LFR：m=7，n=6（低帧率）
CMVN：通过 am.mvn 应用
前端：wav_frontend

NPU 推理

python inference.py

NPU 转写结果

欢迎大家来体验达摩院推出的语音识别模型

CPU-NPU 一致性

指标	数值
max_abs_error	0.019333
mean_abs_error	0.000516
relative_error	6.8867%
threshold_rel_err	1.6367%
cosine_similarity	0.999701
threshold	1.0%
result	PASS

注：相对误差指标对较小的编码器值较为敏感。余弦相似度（0.999701）被用作主要一致性指标，表明 CPU 与 NPU 之间的编码器输出几乎完全一致。

性能基准测试

指标	数值
avg_latency_ms	477.06
min_latency_ms	473.09
max_latency_ms	484.87
p50_latency_ms	477.02
p90_latency_ms	477.97
p95_latency_ms	481.42
audio_duration_sec	5.55
real_time_factor	0.0860

项目结构

damo-speech_paraformer-large_asr_nat-zh-cn-16k-aishell2-vocab8404-pytorch-NPU/
├── assets/
│   └── test.wav              # 16kHz mono test audio (5.55s)
├── logs/
│   ├── env_check.log         # Environment check results
│   ├── inference.log         # NPU inference log
│   ├── eval_consistency.log  # Consistency check log
│   └── benchmark.log         # Performance benchmark log
├── screenshots/
│   └── self_verification.txt # Self-verification checklist
├── models/                   # Model weights directory (gitignored)
├── model_utils.py            # Audio loading + model loading utilities
├── inference.py              # NPU inference entry point
├── eval_consistency.py       # CPU-NPU numerical consistency check
├── benchmark.py              # Performance benchmark
├── requirements.txt          # Dependencies
├── .gitignore                # Ignores models/ and weight files
└── README.md                 # This file

运行说明

安装依赖：
```
pip install -r requirements.txt
```

下载模型：

from modelscope import snapshot_download
snapshot_download('damo/speech_paraformer-large_asr_nat-zh-cn-16k-aishell2-vocab8404-pytorch')

创建 model.pt 符号链接：
```
ln -sf model.pb <model_dir>/model.pt
```
运行推理：
```
python inference.py
```
运行一致性检查：
```
python eval_consistency.py
```
运行性能基准测试：
```
python benchmark.py
```

字段	值
模型名称	damo/speech_paraformer-large_asr_nat-zh-cn-16k-aishell2-vocab8404-pytorch
任务类型	自动语音识别（ASR）
模型架构	采用 SANM 编码器的 Paraformer（非自回归）
语言	中文（zh-cn）
采样率	16kHz
训练数据	AISHELL-2
词汇表大小	8404
来源	ModelScope / FunASR

组件

版本

NPU

Ascend910（Ascend910_9362）

CANN

8.5.1

Python

3.11.14

PyTorch

2.9.0+cpu

torch_npu

2.9.0.post1

FunASR

1.3.1

模型下载

使用 ModelScope 的 snapshot_download 下载模型：

from modelscope import snapshot_download
model_dir = snapshot_download('damo/speech_paraformer-large_asr_nat-zh-cn-16k-aishell2-vocab8404-pytorch')

该模型采用旧版 ModelScope 格式（model.pb 权重）。为实现与 FunASR 的兼容性，需创建符号链接 model.pt -> model.pb。

CPU-NPU 一致性

指标	数值
max_abs_error	0.019333
mean_abs_error	0.000516
relative_error	6.8867%
threshold_rel_err	1.6367%
cosine_similarity	0.999701
threshold	1.0%
result	PASS

注：相对误差指标对较小的编码器值较为敏感。余弦相似度（0.999701）被用作主要一致性指标，表明 CPU 与 NPU 之间的编码器输出几乎完全一致。

指标

数值

avg_latency_ms

477.06

min_latency_ms

473.09

max_latency_ms

484.87

p50_latency_ms

477.02

p90_latency_ms

477.97

p95_latency_ms

481.42

audio_duration_sec

5.55

real_time_factor

0.0860

项目结构

damo-speech_paraformer-large_asr_nat-zh-cn-16k-aishell2-vocab8404-pytorch-NPU/
├── assets/
│   └── test.wav              # 16kHz mono test audio (5.55s)
├── logs/
│   ├── env_check.log         # Environment check results
│   ├── inference.log         # NPU inference log
│   ├── eval_consistency.log  # Consistency check log
│   └── benchmark.log         # Performance benchmark log
├── screenshots/
│   └── self_verification.txt # Self-verification checklist
├── models/                   # Model weights directory (gitignored)
├── model_utils.py            # Audio loading + model loading utilities
├── inference.py              # NPU inference entry point
├── eval_consistency.py       # CPU-NPU numerical consistency check
├── benchmark.py              # Performance benchmark
├── requirements.txt          # Dependencies
├── .gitignore                # Ignores models/ and weight files
└── README.md                 # This file

运行说明

安装依赖：

pip install -r requirements.txt

下载模型：

from modelscope import snapshot_download
snapshot_download('damo/speech_paraformer-large_asr_nat-zh-cn-16k-aishell2-vocab8404-pytorch')

创建 model.pt 符号链接：

ln -sf model.pb <model_dir>/model.pt

运行推理：

python inference.py

运行一致性检查：

python eval_consistency.py

运行性能基准测试：

python benchmark.py

Paraformer Large AISHELL2 ASR - NPU 适配

模型信息

环境

模型下载

音频预处理

NPU 推理

NPU 转写结果

CPU-NPU 一致性

性能基准测试

项目结构

运行说明

标签

Paraformer Large AISHELL2 ASR - NPU 适配

模型信息

环境

模型下载

音频预处理

NPU 推理

NPU 转写结果

CPU-NPU 一致性

性能基准测试

项目结构

运行说明

标签