Dolphin-small NPU 适配

本仓库包含Dolphin-small多语言语音识别模型在华为昇腾NPU上的NPU适配推理脚本。

模型信息

项目	值
模型名称	DataoceanAI/dolphin-small
参数规模	372 M
模型架构	CTC-Attention（E-Branchformer + Transformer）
支持语言	40种东方语言 + 22种汉语方言
原始仓库	ModelScope

硬件与环境

项目	版本/型号
NPU	Ascend 910
CANN	8.5.1
PyTorch	2.9.0+cpu
torch_npu	2.9.0.post1

安装

# Install core dependencies
pip install torch==2.9.0 torch-npu==2.9.0.post1
pip install dataoceanai-dolphin soundfile

# Download model weights
modelscope download --model DataoceanAI/dolphin-small --local_dir /path/to/model

文件

dolphin-small/
├── inference.py      # NPU inference script
├── benchmark.py      # Performance benchmark
├── accuracy.py       # Accuracy validation (CPU vs NPU)
├── assets/           # Test audio samples
└── output/           # Output logs

使用方法

1. NPU 推理

python inference.py \
    --audio assets/test_audio.wav \
    --model_dir /path/to/model \
    --device npu \
    --lang_sym zh \
    --region_sym CN

示例输出：

{
    'text': '<zh><CN>欢迎大家来体验达摩院推出的语音识别模型',
    'text_nospecial': '欢迎大家来体验达摩院推出的语音识别模型',
    'language': 'zh',
    'region': 'CN',
    'inference_time_sec': 0.235
}

2. 性能基准测试

python benchmark.py \
    --audio assets/test_audio.wav \
    --device npu \
    --iterations 10

基准测试结果（Ascend 910，attention_rescoring，beam_size=10）：

指标	数值
平均延迟	0.23秒
P50延迟	0.23秒
P90延迟	0.23秒
吞吐量	4.36样本/秒

3. 精度验证

python accuracy.py \
    --audio assets/test_audio.wav \
    --lang_sym zh \
    --region_sym CN

验证结果：

指标	CPU 基准值	NPU 输出	匹配
文本转录	欢迎大家来体验达摩院推出的语音识别模型	欢迎大家来体验达摩院推出的语音识别模型	是
语言	zh	zh	是
地区	CN	CN	是
准确率	100%	PASS

NPU 适配说明

该模型是纯 PyTorch 模型。通过向 load_model() 传入 device="npu" 实现 NPU 适配。
无需对原始 dolphin 包进行代码修改。
音频加载使用 soundfile 而非 ffmpeg，以减少外部二进制依赖。
已验证 CPU 与 NPU 之间的数值一致性：文本转录、语言及地区预测结果完全一致。

引用

@article{dolphin2025,
  title={Dolphin: A Large-Scale Multilingual Multitask ASR Model for 40 Eastern Languages},
  author={Dataocean AI and Tsinghua University},
  journal={arXiv preprint arXiv:2503.20212},
  year={2025}
}

Dolphin-small NPU 适配

本仓库包含Dolphin-small多语言语音识别模型在华为昇腾NPU上的NPU适配推理脚本。

模型信息

项目	值
模型名称	DataoceanAI/dolphin-small
参数规模	372 M
模型架构	CTC-Attention（E-Branchformer + Transformer）
支持语言	40种东方语言 + 22种汉语方言
原始仓库	ModelScope

硬件与环境

项目	版本/型号
NPU	Ascend 910
CANN	8.5.1
PyTorch	2.9.0+cpu
torch_npu	2.9.0.post1

安装

# Install core dependencies
pip install torch==2.9.0 torch-npu==2.9.0.post1
pip install dataoceanai-dolphin soundfile

# Download model weights
modelscope download --model DataoceanAI/dolphin-small --local_dir /path/to/model

文件

dolphin-small/
├── inference.py      # NPU inference script
├── benchmark.py      # Performance benchmark
├── accuracy.py       # Accuracy validation (CPU vs NPU)
├── assets/           # Test audio samples
└── output/           # Output logs

使用方法

1. NPU 推理

python inference.py \
    --audio assets/test_audio.wav \
    --model_dir /path/to/model \
    --device npu \
    --lang_sym zh \
    --region_sym CN

示例输出：

{
    'text': '<zh><CN>欢迎大家来体验达摩院推出的语音识别模型',
    'text_nospecial': '欢迎大家来体验达摩院推出的语音识别模型',
    'language': 'zh',
    'region': 'CN',
    'inference_time_sec': 0.235
}

2. 性能基准测试

python benchmark.py \
    --audio assets/test_audio.wav \
    --device npu \
    --iterations 10

基准测试结果（Ascend 910，attention_rescoring，beam_size=10）：

指标	数值
平均延迟	0.23秒
P50延迟	0.23秒
P90延迟	0.23秒
吞吐量	4.36样本/秒

3. 精度验证

python accuracy.py \
    --audio assets/test_audio.wav \
    --lang_sym zh \
    --region_sym CN

验证结果：

指标	CPU 基准值	NPU 输出	匹配
文本转录	欢迎大家来体验达摩院推出的语音识别模型	欢迎大家来体验达摩院推出的语音识别模型	是
语言	zh	zh	是
地区	CN	CN	是
准确率	100%	PASS

NPU 适配说明

该模型是纯 PyTorch 模型。通过向 load_model() 传入 device="npu" 实现 NPU 适配。
无需对原始 dolphin 包进行代码修改。
音频加载使用 soundfile 而非 ffmpeg，以减少外部二进制依赖。
已验证 CPU 与 NPU 之间的数值一致性：文本转录、语言及地区预测结果完全一致。

引用

@article{dolphin2025,
  title={Dolphin: A Large-Scale Multilingual Multitask ASR Model for 40 Eastern Languages},
  author={Dataocean AI and Tsinghua University},
  journal={arXiv preprint arXiv:2503.20212},
  year={2025}
}