g
gyccc/iic-speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020-NPU
模型介绍文件和版本Pull Requests讨论分析
下载使用量0

speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020 NPU 适配

模型信息

项目内容
模型名iic/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020
任务类型自动语音识别(ASR)
模型架构Paraformer-Large + VAD + 标点
框架FunASR 1.3.1
来源ModelScope(达摩院)
语言英文
采样率16kHz
特性非流式离线推理,支持长音频、自动语音活动检测(VAD)和标点恢复

环境信息

项目版本
NPUAscend910_9362
CANN8.5.1
Python3.11.14
torch2.x
torch_npu2.9.0
FunASR1.3.1

模型下载

from modelscope import snapshot_download
model_dir = snapshot_download("iic/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020")

音频预处理

  • 输入格式:WAV,16kHz,单声道
  • 预处理:通过 load_wav() 加载并 resample 到 16kHz
  • 支持 torchaudio / soundfile / wave 三层 fallback

NPU 推理命令

python inference.py

NPU 推理输出

refuse horace vo kingdom ibrahim horace re architectural identities kingdom float peck splendor against rubbed hainanese unequal retention sheriffng consist inquired assemble vo

注:测试音频为中文,该模型为英文模型,输出为英文识别结果。

CPU-NPU 精度一致性结果

指标值
max_abs_error0.000684
mean_abs_error0.000023
relative_error0.0479%
cosine_similarity0.99999977568
threshold1.0%
结果PASS

Benchmark 结果

指标值
avg_latency_ms105.76
min_latency_ms103.34
max_latency_ms107.55
p50_latency_ms105.72
p90_latency_ms107.53
p95_latency_ms107.54
audio_duration_sec5.55
real_time_factor0.0191

工程结构

iic-speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020-NPU/
├── assets/
│   └── test.wav
├── logs/
│   ├── env_check.log
│   ├── inference.log
│   ├── eval_consistency.log
│   └── benchmark.log
├── screenshots/
│   └── self_verification.png
├── models/
├── model_utils.py
├── inference.py
├── eval_consistency.py
├── benchmark.py
├── requirements.txt
├── .gitignore
└── README.md

运行说明

pip install -r requirements.txt
python inference.py
python eval_consistency.py
python benchmark.py

标签

#NPU