kotoba-whisper-v2.0 on Ascend NPU

1. 简介

本文档记录 kotoba-whisper-v2.0 模型在华为昇腾 Ascend NPU（Atlas 800 A2/A3）上的部署与验证结果。该模型为基于 Whisper 架构的日语自动语音识别（ASR）模型，由 kotoba-tech 团队开发。

模型架构为 WhisperForConditionalGeneration，编码器 32 层，解码器 2 层（Distil-Whisper 变体），参数量约 756M。

基于 kotoba-whisper v2.0 优化训练

2. 验证环境

组件	版本
`torch-npu`	`2.9.0.post1+gitee7ba04`
`transformers`	`4.57.6`
`soundfile`	latest
`librosa`	latest
CANN	8.5.1

NPU：Ascend910B4（1 卡）
模型路径：使用 HuggingFace Hub 缓存（hf-mirror.com 镜像下载）
推理方式：torch_npu + transformers 直接推理

3. 推理运行

环境准备

pip install transformers torch_npu soundfile librosa

模型下载

HF_ENDPOINT=https://hf-mirror.com python3 -c "
from transformers import WhisperForConditionalGeneration, WhisperProcessor
model = WhisperForConditionalGeneration.from_pretrained('kotoba-tech/kotoba-whisper-v2.0')
processor = WhisperProcessor.from_pretrained('kotoba-tech/kotoba-whisper-v2.0')
# 模型权重自动缓存至 HuggingFace Hub 缓存目录
"

运行推理

python3 inference.py --audio speech.wav --language ja --task transcribe

参数	说明	默认值
`--audio`	输入音频文件路径（.wav）	必填
`--language`	语言代码	`ja`
`--task`	任务类型（transcribe/translate）	`transcribe`
`--device`	推理设备（npu/cpu）	`npu`

4. Smoke 验证

基础功能检查：

python3 inference.py --audio test.wav

验证结果：

模型加载成功，推理正常完成
输出日语转写文本
NPU 利用率正常

5. 性能参考

测试条件：3 秒合成音频（16kHz），单卡 Ascend910B4，max_new_tokens=128。

指标	CPU	NPU
推理耗时	124.4s	18.7s
加速比	-	6.6x
RTF (Real-Time Factor)	41.47	6.24

RTF = 推理时间 / 音频时长，值越小越好。RTF < 1 表示可实时处理。

6. 精度评测

使用合成测试音频（3s, 16kHz, 200-800Hz Sweep），对 NPU 与 CPU 推理输出进行逐 Token 的 logits 比对。

指标	数值
文本匹配	PASS
余弦相似度（logits）	0.999999
Logits 最大绝对误差	0.044278
Logits 平均绝对误差	0.008331
相对误差（vs 信号范围）	0.2008%
精度结论	PASS (< 1%)

评测输出与 CPU 参考完全一致，余弦相似度 0.999999，误差远小于 1% 阈值。

运行精度评测：

python3 eval_accuracy.py

运行性能评测：

python3 eval_performance.py

7. 注意事项

特征提取在 CPU 端：WhisperProcessor 对音频的 Mel 频谱特征提取基于 NumPy，在 CPU 上执行。仅模型 forward/generate 在 NPU 上运行。
首次推理慢：NPU 首次运行包含编译优化，后续推理速度稳定。
采样率：模型要求输入音频为 16kHz 单声道。inference.py 会自动重采样多声道或不同采样率的音频。
最大长度：默认 max_new_tokens=256，处理长音频时可根据需要调整。
权重下载：国内网络建议配置 HF_ENDPOINT=https://hf-mirror.com 使用 HuggingFace 镜像。

8. 交付件清单

文件	说明
`inference.py`	NPU 推理脚本（支持 NPU/CPU 双端）
`eval_accuracy.py`	精度评测脚本（NPU vs CPU logits 比对）
`eval_performance.py`	性能基准测试脚本
`README.md`	本文档