冬

opus-mt-ar-it Ascend NPU 部署指南

项目简介

opus-mt-ar-it 是基于 Helsinki-NLP OPUS-MT 的阿拉伯语到意大利语翻译模型，采用 6 层 Transformer 编码器-解码器架构 (MarianMT)，可在 512 长度序列上进行高质量翻译。

特性

支持 Ascend NPU 推理加速
CPU 与 NPU 翻译结果一致性验证
翻译速度提升约 3.8 倍
基于 SentencePiece 分词
支持多句批量翻译

环境要求

硬件：华为 Ascend 910 系列 NPU
CANN：8.0.RC1 或更高版本
PyTorch：2.0+ 并带 torch_npu
Docker：容器名称 test-modelagent
transformers：4.8+

目录结构

opus-mt-ar-it-ascend/
├── inference.py              # 推理测试脚本
├── log.txt                  # 测试日志
├── README.md                # 本文档
├── test_sentences.txt       # 测试句子
├── precision_result.json    # 精度测试结果
└── inference_result.json   # 推理输出结果

部署步骤

1. 进入容器

docker exec -it test-modelagent bash

2. 设置环境变量

source /usr/local/Ascend/ascend-toolkit/set_env.sh

3. 准备模型文件

模型文件位于 /data/ysws/agentsp/5-20/opus-mt-ar-it/ 目录下：

pytorch_model.bin - 模型权重 (约 304MB)
config.json - 模型配置
vocab.json - 词表文件
source.spm / target.spm - SentencePiece 模型
tokenizer_config.json - 分词器配置

4. 安装依赖

pip install transformers torch_npu sacremoses

使用方式

方式一：普通推理模式

运行翻译推理（仅NPU）：

cd /data/ysws/agentsp/5-20/opus-mt-ar-it-ascend/

python3 inference.py

方式二：精度测试模式 (CPU vs NPU)

运行精度对比测试，验证 NPU 翻译结果与 CPU 一致性：

cd /data/ysws/agentsp/5-20/opus-mt-ar-it-ascend/

python3 inference.py precision_test

测试验证

精度测试结果

指标	实测值	阈值	状态
翻译结果一致性	完全一致	相等	PASS
CPU 推理时间 (3句)	5.01s	-	-
NPU 推理时间 (3句)	1.31s	-	-
加速比	3.82x	> 1x	PASS

翻译结果示例

输入 (阿拉伯语)	输出 (意大利语)
صباح الخير، كيف حالك؟	Buongiorno. Come stai?
سررت بلقائك.	Piacere di conoscerti.
الحياة جميلة عندما نقدرها.	La vita e' bella quando la si apprezza.

结果: CPU 和 NPU 输出的翻译结果完全一致，翻译质量正常

测试日志

============================================================
opus-mt-ar-it - Ascend NPU Translation Test
Output: /data/ysws/agentsp/5-20/opus-mt-ar-it-ascend
============================================================

Mode: PRECISION TEST
NPU available: True
Device: npu:0

============================================================
Loading Model and Tokenizer
============================================================
Tokenizer loaded successfully
Model loaded successfully

============================================================
Running CPU Translation
============================================================
Input:  صباح الخير، كيف حالك؟
Output: Buongiorno. Come stai?
Input:  سررت بلقائك.
Output: Piacere di conoscerti.
Input:  الحياة جميلة عندما نقدرها.
Output: La vita e' bella quando la si apprezza.

CPU total time: 5.0100s

============================================================
Running NPU Translation
============================================================
Input:  صباح الخير، كيف حالك؟
Output: Buongiorno. Come stai?
Input:  سررت بلقائك.
Output: Piacere di conoscerti.
Input:  الحياة جميلة عندما نقدرها.
Output: La vita e' bella quando la si apprezza.

NPU total time: 1.3121s
Speedup: 3.82x

============================================================
Precision Test Results
============================================================

Translation outputs match: PASS

============================================================
Test Complete!
============================================================

Python API 使用示例

基本翻译

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

MODEL_DIR = "/data/ysws/agentsp/5-20/opus-mt-ar-it"

tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_DIR)

device = torch.device("npu:0")
model = model.to(device)
model.eval()

sentences = ["صباح الخير، كيف حالك؟"]
inputs = tokenizer(sentences, return_tensors="pt", padding=True)
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], max_new_tokens=100)

translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Translation: {translation}")

批量翻译

sentences = [
    "صباح الخير، كيف حالك؟",
    "سررت بلقائك.",
    "الحياة جميلة عندما نقدرها."
]

inputs = tokenizer(sentences, return_tensors="pt", padding=True)
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], max_new_tokens=100)

translations = [tokenizer.decode(out, skip_special_tokens=True) for out in outputs]
for src, tgt in zip(sentences, translations):
    print(f"{src} -> {tgt}")

模型结构

架构类型: MarianMT (Transformer 编码器-解码器)
编码器: 6 层 Transformer
解码器: 6 层 Transformer
隐藏层维度: 512
注意力头数: 8
前馈网络维度: 2048
词表大小: 62526

组件	说明
encoder	6 层 Transformer 编码器
decoder	6 层 Transformer 解码器
lm_head	语言模型头部

推理参数配置

从 config.json 提取的关键参数:

{
  "d_model": 512,
  "encoder_layers": 6,
  "decoder_layers": 6,
  "encoder_attention_heads": 8,
  "decoder_attention_heads": 8,
  "encoder_ffn_dim": 2048,
  "decoder_ffn_dim": 2048,
  "vocab_size": 62526,
  "max_position_embeddings": 512,
  "scale_embedding": true
}

常见问题

Q: 翻译结果为空或包含特殊符号?

A: 检查分词器是否正确加载，确保使用 AutoTokenizer.from_pretrained() 而非手动加载。

Q: 推理速度慢?

A: NPU 推理已针对大规模矩阵运算优化，当前加速比约 3.8x。首次推理会有编译开销，后续推理会更快。

Q: 如何调整翻译长度?

A: 修改 max_new_tokens 参数，默认值为 100。

Q: 支持哪些语言对?

A: 本模型仅支持阿拉伯语 (ar) 到意大利语 (it) 的翻译。其他语言对需要使用对应的模型。

参考链接

原始模型: https://huggingface.co/Helsinki-NLP/opus-mt-ar-it
OPUS-MT: https://github.com/Helsinki-NLP/OPUS-MT
MarianMT: https://arxiv.org/abs/1804.05519

许可证

本项目遵循 Apache-2.0 许可证