from funasr.auto.auto_model import AutoModel
import soundfile as sf
model = AutoModel(
model="iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
device="npu:0",
)
audio, _ = sf.read("audio.wav", dtype="float32")
result = model.generate(input=audio)
print(result[0]["text"])
3.3 INT8 ONNX 推理(CPU,模型最小)
import onnxruntime as ort
import numpy as np
import soundfile as sf
from inference import extract_features, ctc_greedy_decode
session = ort.InferenceSession("model_quant.onnx", providers=["CPUExecutionProvider"])
feats = extract_features("audio.wav") # mel+LFR+MVN → (T, 560)
speech = np.expand_dims(feats, 0).astype(np.float32)
speech_len = np.array([speech.shape[1]], dtype=np.int32)
logits = session.run(None, {"speech": speech, "speech_lengths": speech_len})[0]
text = ctc_greedy_decode(logits[0])
print(text)
NPU (Ascend 910):吞吐量 2.12 次/秒,较 FP32 CPU 快 2.11倍,性能最稳定(标准差 0.002秒)
NPU 首次推理:存在 JIT 编译开销(约0.78秒),稳态后延迟降至约0.47秒
7. 交付件清单
文件
说明
inference.py
推理脚本(支持 ONNX CPU / FunASR NPU 多后端)
README.md
部署文档与评测报告(本文件)
benchmark_results.json
性能评测数据
accuracy_results.json
精度评测数据
8. 引用
@inproceedings{gao2023funasr,
title={FunASR: A Fundamental End-to-End Speech Recognition Toolkit},
author={Gao, Zhifu and Zhang, Shiliang and others},
booktitle={INTERSPEECH},
year={2023}
}
@article{gao2020paraformer,
title={Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition},
author={Gao, Zhifu and Zhang, Shiliang and others},
journal={arXiv:2006.01713},
year={2020}
}