medasr Ascend NPU 部署指南

项目简介

MedASR 是 Google 开发的医疗语音识别模型，基于 Conformer 架构，使用 CTC (Connectionist Temporal Classification) loss 进行训练。模型专门针对医疗术语进行了优化，适用于放射学听写和医患对话转录等场景。

特性

支持 Ascend NPU 推理加速
CPU vs NPU 精度对比测试 (< 1% 误差)
基于 Conformer 架构的 CTC 模型
105M 参数
医疗语音识别 WER: 6.6% (RAD-DICT 数据集)

环境要求

硬件: 华为 Ascend 910 系列 NPU
CANN: 8.0.RC1 或更高版本
PyTorch: 2.0+ with torch_npu
Docker: 容器名称 test-modelagent
transformers: 5.0.0.dev0 (用于完整功能)

目录结构

/data/ysws/agentsp/5-14/medasr-ascend/
├── inference.py          # 精度测试脚本
├── log.txt               # 测试日志
├── README.md             # 本文档
├── test_audio_0.wav      # 测试音频样本
├── test_audio_1.wav
└── test_audio_2.wav

部署步骤

1. 进入容器

docker exec -it test-modelagent bash

2. 设置环境变量

source /usr/local/Ascend/ascend-toolkit/set_env.sh

3. 准备模型文件

模型文件应放在 /data/ysws/agentsp/5-14/medasr/ 目录下：

model.safetensors - 模型权重 (约 400MB)
config.json - 模型配置
spiece.model - SentencePiece 分词器
tokenizer.json - 分词器配置

4. 执行精度测试

cd /data/ysws/agentsp/5-14/medasr-ascend/
python3 inference.py --precision_test

使用方式

方式一：普通推理模式

由于 MedASR 模型需要特殊的 google-health-medasr 包来实现完整的语音识别功能（包含自定义 LASRFeatureExtractor 和 LASRProcessor），当前版本仅支持模型权重的精度验证。

cd /data/ysws/agentsp/5-14/medasr-ascend/

# 创建测试音频样本
python3 inference.py

方式二：精度测试模式 (CPU vs NPU)

运行精度对比测试，验证 NPU 计算结果与 CPU 一致性：

cd /data/ysws/agentsp/5-14/medasr-ascend/

# 运行完整精度测试
python3 inference.py --precision_test

# 指定测试张量数量
python3 inference.py --precision_test --num_tensors 20

命令行参数说明

参数	说明	默认值
`--model_path`	模型文件路径	`/data/ysws/agentsp/5-14/medasr`
`--device`	运行设备	`npu:0`
`--precision_test`	运行精度测试模式	`False`
`--num_tensors`	精度测试的张量数量	`20`

测试验证

精度测试结果

指标	实测值	阈值	状态
Max error (sum)	7.81e-03	< 1.00e+00	PASS
Max error (mean)	1.53e-05	< 1.00e-04	PASS
Max error (std)	1.91e-06	< 1.00e-03	PASS

性能数据

操作	耗时
模型加载	0.02s
CPU 参考计算 (20 tensors)	0.03s
NPU 推理 (20 tensors)	2.05s

测试日志

完整测试日志保存在 log.txt

模型结构

模型类型: lasr_ctc
编码器: 17 层 Conformer
隐藏层大小: 512
词汇表大小: 512
Mel bins: 128

组件	说明
encoder.layers.*.conv	Conformer 卷积模块 (depthwise + pointwise)
encoder.layers.*.self_attn	自注意力层
encoder.layers.*.feed_forward	Feed Forward 层
ctc_head	CTC 输出层

张量精度详情

张量名称	Sum Error	Mean Error	Std Error
ctc_head.bias	0.00e+00	0.00e+00	0.00e+00
ctc_head.weight	0.00e+00	0.00e+00	0.00e+00
encoder.layers.0.conv.depthwise_conv.weight	1.91e-05	1.16e-09	0.00e+00
encoder.layers.0.conv.norm.weight	7.81e-03	1.53e-05	0.00e+00
encoder.layers.0.feed_forward1.linear1.weight	2.44e-04	2.33e-10	0.00e+00

完整功能说明

注意: 完整的 MedASR 语音识别功能需要安装 google-health-medasr 包，该包包含自定义的 LASRFeatureExtractor 和 LASRProcessor。由于网络限制，无法从 GitHub 安装此包。

完整功能包括：

医疗语音转文本
16kHz 音频输入
SentencePiece 分词
CTC beam search 解码（可选 Language Model）

要启用完整功能，请手动安装：

pip install git+https://github.com/google-health/medasr.git

常见问题

Q: 精度测试失败?

A: 检查 NPU 驱动是否正确安装，确保 CANN 环境变量已 source。

Q: 如何使用完整语音识别功能?

A: 需要安装 google-health-medasr 包。由于网络限制，请确保可以访问 GitHub。

Q: 推理输出的转录是空的?

A: 当前实现仅验证模型权重的 NPU/CPU 一致性，不包含完整推理流程。

参考链接

原始模型: https://huggingface.co/google/medasr
模型文档: https://developers.google.com/health-ai-developer-foundations/medasr
GitHub: https://github.com/google-health/medasr

许可证

本项目遵循 Google Health AI Developer Foundations 许可证。