本仓库完成了 Qwen3-ASR-0.6B 在昇腾 NPU 上的适配与验证。
Qwen3-ASR-0.6B 是通义千问团队开源的轻量级语音识别模型,支持 52 种语言与方言。本适配使其可在昇腾 NPU 上高效运行,仅需极少量代码改动。
# 基础推理(Transformers 后端)
pip install -U qwen-asr
# 如需使用 vLLM 后端,需额外安装 vllm 依赖
pip install -U qwen-asr[vllm]说明:当前运行镜像已预装
vllm与vllm-ascend,因此验证时未单独执行上述qwen-asr[vllm]安装命令。若你的环境未预装这两个包,则需执行该命令以启用 vLLM 后端。
模型权重来源:
python3 -m atomgit download hf_mirrors/Qwen/Qwen3-ASR-0.6B -d /opt/atomgit/weight/Qwen3-ASR-0.6B
python3 -m atomgit download hf_mirrors/Qwen/Qwen3-ForcedAligner-0.6B -d /opt/atomgit/weight/Qwen3-ForcedAligner-0.6Bhuggingface-cli download Qwen/Qwen3-ASR-0.6B --local-dir ./Qwen3-ASR-0.6B
huggingface-cli download Qwen/Qwen3-ForcedAligner-0.6B --local-dir ./Qwen3-ForcedAligner-0.6B相比原始 GPU 脚本,NPU 适配仅需两处改动:
import torch_npudevice_map="cuda:0" 改为 device_map="npu:0"import torch
import torch_npu
from qwen_asr import Qwen3ASRModel
model = Qwen3ASRModel.from_pretrained(
"/opt/atomgit/weight/Qwen3-ASR-0.6B",
dtype=torch.bfloat16,
device_map="npu:0",
max_inference_batch_size=32,
max_new_tokens=256,
)
results = model.transcribe(
audio="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav",
language=None,
)
print(results[0].language)
print(results[0].text)输出示例:
English
Hmm. Oh yeah, yeah. He wasn't even that big when I started listening to him, but and his solo music didn't do overly well, but he did very well when he started writing for other people.import torch
import torch_npu
from qwen_asr import Qwen3ASRModel
model = Qwen3ASRModel.from_pretrained(
"/opt/atomgit/weight/Qwen3-ASR-0.6B",
dtype=torch.bfloat16,
device_map="npu:0",
max_inference_batch_size=32,
max_new_tokens=256,
forced_aligner="/opt/atomgit/weight/Qwen3-ForcedAligner-0.6B",
forced_aligner_kwargs=dict(
dtype=torch.bfloat16,
device_map="npu:0",
),
)
results = model.transcribe(
audio=[
"https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_zh.wav",
"https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav",
],
language=["Chinese", "English"],
return_time_stamps=True,
)
for r in results:
print(r.language, r.text, r.time_stamps[0] if r.time_stamps else None)输出示例:
Chinese 甚至出现交易几乎停滞的情况。 ForcedAlignItem(text='甚', start_time=0.4, end_time=0.72)
English Hmm. Oh yeah, yeah. He wasn't even that big when I started listening to him, but and his solo music didn't do overly well, but he did very well when he started writing for other people. ForcedAlignItem(text='Hmm', start_time=0.48, end_time=0.88)import torch
import torch_npu
from qwen_asr import Qwen3ForcedAligner
model = Qwen3ForcedAligner.from_pretrained(
"/opt/atomgit/weight/Qwen3-ForcedAligner-0.6B",
dtype=torch.bfloat16,
device_map="npu:0",
)
results = model.align(
audio="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_zh.wav",
text="甚至出现交易几乎停滞的情况。",
language="Chinese",
)
print(results[0])
print(results[0][0].text, results[0][0].start_time, results[0][0].end_time)输出示例:
ForcedAlignResult(items=[ForcedAlignItem(text='甚', start_time=0.4, end_time=0.72), ForcedAlignItem(text='至', start_time=0.72, end_time=0.96), ForcedAlignItem(text='出', start_time=0.96, end_time=1.12), ForcedAlignItem(text='现', start_time=1.12, end_time=1.52), ForcedAlignItem(text='交', start_time=1.52, end_time=1.76), ForcedAlignItem(text='易', start_time=1.76, end_time=2.0), ForcedAlignItem(text='几', start_time=2.0, end_time=2.24), ForcedAlignItem(text='乎', start_time=2.24, end_time=2.48), ForcedAlignItem(text='停', start_time=2.48, end_time=2.72), ForcedAlignItem(text='滞', start_time=2.72, end_time=2.88), ForcedAlignItem(text='的', start_time=2.88, end_time=3.04), ForcedAlignItem(text='情', start_time=3.04, end_time=3.36), ForcedAlignItem(text='况', start_time=3.36, end_time=3.68)])
甚 0.4 0.72vLLM 后端由于 qwen-asr 与当前 vllm-ascend 版本存在 API 差异,需要通过 PYTHONPATH 注入兼容性补丁。
export PYTHONPATH=/opt/atomgit/Qwen3-ASR-0.6B/patch_site:$PYTHONPATH
python3 inference_vllm.pyimport torch
import torch_npu
from qwen_asr import Qwen3ASRModel
if __name__ == '__main__':
model = Qwen3ASRModel.LLM(
model="/opt/atomgit/weight/Qwen3-ASR-0.6B",
max_inference_batch_size=128,
max_new_tokens=4096,
forced_aligner="/opt/atomgit/weight/Qwen3-ForcedAligner-0.6B",
forced_aligner_kwargs=dict(
dtype=torch.bfloat16,
device_map="npu:0",
),
)
results = model.transcribe(
audio=[
"https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_zh.wav",
"https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav",
],
language=["Chinese", "English"],
return_time_stamps=True,
)
for r in results:
print(r.language, r.text, r.time_stamps[0] if r.time_stamps else None)输出示例:
Chinese 甚至出现交易几乎停滞的情况。 ForcedAlignItem(text='甚', start_time=0.4, end_time=0.72)
English Mhm. Oh yeah, yeah. He wasn't even that big when I started listening to him, but and his solo music didn't do overly well, but he did very well when he started writing for other people. ForcedAlignItem(text='Mhm', start_time=0.4, end_time=0.88)python3 benchmark.py===== Benchmark Results =====
Average latency: 1.972s
Min latency: 1.847s
Max latency: 2.061spython3 accuracy.py===== CPU (float32) 基线 =====
Language: English
Text: Hmm. Oh yeah, yeah. He wasn't even that big when I started listening to him, but and his solo music didn't do overly well, but he did very well when he started writing for other people.
===== NPU (bfloat16) =====
Language: English
Text: Hmm. Oh yeah, yeah. He wasn't even that big when I started listening to him, but and his solo music didn't do overly well, but he did very well when he started writing for other people.
===== 对比 =====
CPU text length: 185
NPU text length: 185
Match: TrueNPU 输出与 CPU float32 基线 100% 一致,误差远低于 1%。
| 文件 | 说明 |
|---|---|
inference.py | 快速 NPU 推理(transformers 后端) |
inference_batch_timestamps.py | 批量推理 + 时间戳(transformers 后端) |
inference_forced_aligner.py | ForcedAligner 独立使用 |
inference_vllm.py | vLLM 后端推理 |
benchmark.py | NPU 性能评测 |
accuracy.py | 与 CPU 基线的精度对比 |
patch_site/sitecustomize.py | vllm-ascend 兼容性补丁 |
output/ | 运行日志 |
@article{Qwen3-ASR,
title={Qwen3-ASR Technical Report},
author={Xian Shi, Xiong Wang, Zhifang Guo, Yongqi Wang, Pei Zhang, Xinyu Zhang, Zishan Guo, Hongkun Hao, Yu Xi, Baosong Yang, Jin Xu, Jingren Zhou, Junyang Lin},
journal={arXiv preprint arXiv:2601.21337},
year={2026}
}