faster-whisper-medium-npu

概述

基于 faster-whisper-medium 的自动语音识别（ASR）模型，已适配 昇腾 Ascend NPU。

原始模型: pengzhendong/faster-whisper-medium
任务类型: 自动语音识别（ASR）
模型框架: PyTorch（openai-whisper）
输入格式: 16kHz 单声道 WAV 音频
输出格式: 识别文本（SRT 格式/纯文本）
语言: 多语言（支持 99+ 种语言）
NPU 适配: Ascend910 (torch_npu)

环境要求

Python 3.8+
PyTorch + torch_npu（昇腾 NPU）
CANN（昇腾 AI 处理器驱动）
openai-whisper
soundfile + librosa + numpy

依赖安装

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt

NPU 适配说明

该模型基于 openai-whisper 框架运行。适配过程中需要注意以下事项：

Mel 滤波器精度: whisper 的 mel 滤波器默认使用 float64 精度，在昇腾 NPU 上需要转换为 float32。
模型迁移: 将 whisper 模型加载后通过 .to("npu") 移至 NPU 设备。
混合精度: NPU 推理时启用 fp16 可以进一步提升性能。

使用方法

1. NPU 推理

python3 inference.py --audio <音频文件> --language en

默认使用 real_sample_16k.wav 测试音频。

2. CPU vs NPU 精度对比

python3 compare_cpu_npu.py --audio <音频文件>

该脚本分别在 CPU 和 NPU 上执行推理，对比输出文本并计算 Word Error Rate（WER）。

推理结果

使用 6.62 秒的英语语音样本进行测试：

指标	CPU	NPU
推理耗时	34.163s	0.755s
加速比	1.0x	45.24x

CPU 输出

After early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels.

NPU 输出

After early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels.

CPU/NPU 精度测试

测试方法

相同音频输入（16kHz 单声道 WAV）
相同 whisper 模型权重
分别在 CPU 和 NPU 上执行推理
对比转录文本，计算 Word Error Rate（WER）

精度测试结果

指标	值
Exact Match（完全一致）	True
Word Error Rate (WER)	0.00%
CPU 单词数	18
NPU 单词数	18

结论

NPU 与 CPU 推理结果一致，WER = 0.00% < 1%，精度验证通过。

NPU 推理结果与 CPU 推理结果完全相同，说明昇腾 NPU 的推理精度完全满足语音识别任务要求。

性能测试

设备	推理耗时	加速比
CPU	34.163s	1.0x（基准）
NPU (Ascend910)	0.755s	45.24x

NPU 推理相比 CPU 推理获得了约 45.2x 的加速效果，显著提升了推理效率。

文件说明

faster-whisper-medium-npu/
├── inference.py              # NPU 推理脚本
├── compare_cpu_npu.py        # CPU vs NPU 精度对比脚本
├── requirements.txt          # 依赖清单
├── README.md                 # 本文档（中文）
└── terminal_screenshot.png   # 终端运行截图

推理成功证据

本仓库提供完整的推理脚本，支持 CPU 和 NPU 双平台推理：

# NPU 推理
python3 inference.py --device npu

# CPU 推理
python3 inference.py --device cpu

推理完成后会输出推理结果和耗时，表明模型在 NPU 上推理成功。

模型标签

#+NPU #+语音 #+ASR #+昇腾 #+Whisper #+自动语音识别

参考