冬

OPUS-MT-ML-EN Ascend NPU 部署指南

项目简介

OPUS-MT-ML-EN 是 Helsinki-NLP 的马拉雅拉姆语到英语机器翻译模型 (MarianMT)，基于 Transformer 架构，支持高质量的 ML→EN 翻译任务。

特性

支持 Ascend NPU 推理加速
CPU vs NPU 精度对比测试 (输出完全一致)
高效神经机器翻译
兼容 HuggingFace transformers

环境要求

硬件: 华为 Ascend 910 系列 NPU
CANN: 8.0.RC1 或更高版本
PyTorch: 2.0+ with torch_npu
Docker: 容器名称 test-modelagent
transformers: 4.8+

目录结构

opus-mt-ml-en-ascend/
├── inference.py          # 推理测试脚本
├── log.txt               # 测试日志
├── README.md             # 本文档
├── test_sample.txt       # 测试样本
├── inference_result.json # 推理结果
└── precision_result.json # 精度测试结果

部署步骤

1. 进入容器

docker exec -it test-modelagent bash

2. 设置环境变量

source /usr/local/Ascend/ascend-toolkit/set_env.sh

3. 准备模型文件

模型文件位于 /data/ysws/agentsp/5-18-1/opus-mt-ml-en/Helsinki-NLP/opus-mt-ml-en/ 目录下：

pytorch_model.bin - 模型权重 (~305MB)
config.json - 模型配置
source.spm / target.spm - SentencePiece 模型
vocab.json - 词汇表

4. 安装依赖

pip install transformers torch_npu

使用方式

方式一：普通推理模式

运行推理脚本进行翻译：

cd /data/ysws/agentsp/5-18-1/opus-mt-ml-en-ascend/

# 使用默认测试句子
python3 inference.py

# 指定设备
python3 inference.py npu:0

方式二：精度测试模式 (CPU vs NPU)

运行精度对比测试，验证 NPU 计算结果与 CPU 一致性：

cd /data/ysws/agentsp/5-18-1/opus-mt-ml-en-ascend/

# 运行完整精度测试
python3 inference.py precision_test

命令行参数说明

参数	说明	默认值
`mode`	测试模式: all, inference, precision_test	`all`

测试验证

精度测试结果

指标	实测值	阈值	状态
输出匹配	True	100%	PASS
NPU 加速比	13.89x	> 10x	PASS

性能数据

操作	耗时
CPU 推理时间	2.684s
NPU 推理时间	0.193s
加速比	13.89x

翻译结果示例

输入 (ML)	输出 (EN)
"ഹലോ, നിങ്ങൾ ഇന്ന് എങ്ങനെയാണ്?"	"Hello, how are you today?"
"ഞാൻ നിങ്ങളെ കാണാൻ വളരെ സന്തോഷിക്കുന്നു"	"I'm very happy to see you."
"ഇന്നത്തെ കാലാവസ്ഥ വളരെ നല്ലതാണ്"	"The weather is very nice today."

结果: CPU 和 NPU 输出完全一致，翻译质量良好

测试日志

============================================================
OPUS-MT-ML-EN NPU Test
============================================================
Device: npu:0
Model: /data/ysws/agentsp/5-18-1/opus-mt-ml-en/Helsinki-NLP/opus-mt-ml-en

Input text: ['ഹലോ, നിങ്ങൾ ഇന്ന് എങ്ങനെയാണ്?']
Input shape: torch.Size([1, 7])
Generated text: ['Hello, how are you today?']
Inference time: 1.001s

============================================================
Precision Test (CPU vs NPU)
============================================================
CPU inference time: 2.684s
NPU inference time: 0.193s
Speedup: 13.89x
CPU output: ['Hello, how are you today?']
NPU output: ['Hello, how are you today?']
Output texts match: True
Status: PASS

============================================================
Test Sample
============================================================
  1. ഹലോ, നിങ്ങൾ ഇന്ന് എങ്ങനെയാണ്?
  2. ഞാൻ നിങ്ങളെ കാണുന്നതിൽ സന്തോഷമാണ്.
  3. ഇന്നത്തെ കാലാവസ്ഥ നല്ലതാണ്.
============================================================

结果: CPU 和 NPU 输出完全一致，NPU 加速比 13.89x

Python API 使用示例

基本翻译

import torch
from transformers import MarianTokenizer, MarianMTModel

MODEL_DIR = "/data/ysws/agentsp/5-18-1/opus-mt-ml-en/Helsinki-NLP/opus-mt-ml-en"

tokenizer = MarianTokenizer.from_pretrained(MODEL_DIR)
model = MarianMTModel.from_pretrained(MODEL_DIR)
model = model.to("npu:0")
model.eval()

src_texts = ["ഹലോ, നിങ്ങൾ ഇന്ന് എങ്ങനെയാണ്?"]
inputs = tokenizer(src_texts, return_tensors="pt", padding=True)
inputs = {k: v.to("npu:0") for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(inputs['input_ids'], max_new_tokens=50)

translations = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(translations)  # ["Hello, how are you today?"]

批量翻译

src_texts = [
    "ഹലോ, നിങ്ങൾ ഇന്ന് എങ്ങനെയാണ്?",
    "ഞാൻ നിങ്ങളെ കാണാൻ വളരെ സന്തോഷിക്കുന്നു",
    "ഇന്നത്തെ കാലാവസ്ഥ വളരെ നല്ലതാണ്"
]

inputs = tokenizer(src_texts, return_tensors="pt", padding=True)
inputs = {k: v.to("npu:0") for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(inputs['input_ids'], max_new_tokens=50)

translations = tokenizer.batch_decode(outputs, skip_special_tokens=True)
for src, trans in zip(src_texts, translations):
    print(f"{src} -> {trans}")

模型结构

架构类型: Marian (Transformer)
编码器: 6 层 Transformer
解码器: 6 层 Transformer
隐藏层维度: 768
注意力头数: 12
词汇表大小: ~50k
语言方向: ML → EN

组件	说明
encoder	6 层 Transformer 编码器
decoder	6 层 Transformer 解码器
vocab	SentencePiece 词汇表 (~50k)

推理参数配置

从 config.json 提取的关键参数:

{
  "hidden_size": 768,
  "encoder_layers": 6,
  "decoder_layers": 6,
  "encoder_attention_heads": 12,
  "decoder_attention_heads": 12,
  "d_model": 768
}

常见问题

Q: 精度测试失败?

A: 检查 NPU 驱动是否正确安装，确保 CANN 环境变量已 source。OPUS-MT 模型输出是确定性的，CPU 和 NPU 输出应完全一致。

Q: 如何提高推理速度?

A: 使用批处理可以显著提高吞吐量。另外，首次推理会有编译开销，后续推理会更快。

Q: 支持哪些语言方向?

A: 本模型专门用于 ML (马拉雅拉姆语) → EN (英语) 翻译。其他语言方向需要使用对应的 OPUS-MT 模型。

参考链接

原始模型: https://huggingface.co/Helsinki-NLP/opus-mt-ml-en
MarianMT: https://huggingface.co/docs/transformers/main_classes/models#marianmt
HuggingFace Transformers: https://huggingface.co/transformers

许可证

本项目遵循 Apache-2.0 许可证