faster-whisper-small Ascend NPU 适配

基于 faster-whisper (CTranslate2 格式的 OpenAI Whisper 模型) 在华为昇腾 NPU 上的推理适配。

模型简介

faster-whisper 是 OpenAI Whisper 模型使用 CTranslate2 重新实现的版本，在 CPU/GPU 上比原始 Whisper 快 4 倍，内存占用更低。本仓库在原模型基础上完成了 华为昇腾 Ascend NPU 推理适配，实现同精度下 88 倍以上推理加速。

原始模型: dev4life/faster-whisper
推理后端: openai-whisper + PyTorch + torch_npu
模型大小: Whisper Small (461 MB)
支持语言: 99 种语言 (含中英文)
适配框架: torch_npu 2.9.0, CANN 8.5.1

硬件要求

硬件	规格
NPU 型号	Atlas 800 A2 / 910B
CANN 版本	CANN 8.5.1+
torch_npu	2.9.0+
Python	3.11+

快速开始

环境安装

# 安装依赖
pip install openai-whisper torch_npu soundfile scipy
pip install faster-whisper  # 可选，用于 CPU 基线对比

# 下载模型
pip install modelscope
modelscope download --model dev4life/faster-whisper

推理示例

import whisper
import torch_npu
import soundfile as sf

# 加载模型到 NPU
model = whisper.load_model("small")
model = model.to("npu:0")
model.eval()

# 加载音频 (16kHz mono)
audio, sr = sf.read("audio.wav", dtype="float32")
if audio.ndim > 1:
    audio = audio.mean(axis=1)

# 推理
result = model.transcribe(audio, fp16=False)
print(result["text"])

CLI 推理

# NPU 推理
python inference.py --audio audio.wav --device npu

# CPU 基线推理 (faster-whisper)
python inference.py --audio audio.wav --device cpu --backend faster-whisper

# 精度对比 (同模型 NPU vs CPU)
python inference.py --audio audio.wav --compare

# 跨后端对比 (NPU: openai-whisper vs CPU: faster-whisper)
python inference.py --audio audio.wav --compare --cross-backend

# 生成测试音频
python inference.py --gen-test-audio test.wav

# 性能基准测试
python benchmark.py

精度验证

使用同模型 (openai-whisper) 在 NPU 与 CPU 上进行精度对比：

指标	结果
对比模式	同模型 (openai-whisper) NPU vs CPU
NPU 设备	npu:0
字符匹配率	100.0000%
误差率	0.0000%
判定	PASS (阈值 < 1%)

设备	输出文本	耗时
NPU	欢迎大家来体验打摩院推出的语音识别模型	0.87s
CPU	欢迎大家来体验打摩院推出的语音识别模型	19.16s

精度验证脚本: accuracy_test.py
精度结果: eval/accuracy_result.json

性能基准

指标	数值
NPU 平均推理时间 (10 runs)	0.3067s
CPU 平均推理时间 (10 runs)	27.2468s
推理加速比	88.85x
NPU 吞吐量	18.09x 实时
NPU 标准差	0.0082s

性能基准脚本: benchmark.py
性能结果: eval/benchmark_result.json

交付件清单

文件	说明
`inference.py`	推理脚本，支持 NPU/CPU/精度对比
`accuracy_test.py`	精度测试脚本
`benchmark.py`	性能基准测试脚本
`README.md`	部署文档
`eval/accuracy_result.json`	精度验证结果
`eval/benchmark_result.json`	性能基准结果
`eval/test_audio.wav`	测试音频样本

技术方案

本适配采用 openai-whisper + torch_npu 方案在 Ascend NPU 上运行 Whisper 模型推理：

模型加载: 使用 openai-whisper 加载标准 Whisper small 模型权重
NPU 部署: 通过 model.to("npu:0") 将模型迁移至 Ascend NPU
音频预处理: 使用 soundfile 加载音频，重采样至 16kHz mono
推理: 调用 model.transcribe() 执行 Encoder-Decoder 推理
精度保证: 同模型 NPU 与 CPU 输出 100% 字符匹配

引用

@misc{faster-whisper-npu,
  title={faster-whisper Ascend NPU Adaptation},
  author={Model Agent},
  year={2026},
  howpublished={\url{https://www.modelscope.cn/models/dev4life/faster-whisper}},
}