冬

opus-mt-fr-ar Ascend NPU 部署指南

项目简介

opus-mt-fr-ar 是基于 Helsinki-NLP OPUS-MT 的法语到阿拉伯语翻译模型，采用 6 层 Transformer 编码器-解码器架构 (MarianMT)，可在 512 长度序列上进行高质量翻译。

特性

支持 Ascend NPU 推理加速
CPU 与 NPU 翻译结果一致性验证
翻译速度提升约 4.4 倍
基于 SentencePiece 分词
支持多句批量翻译

环境要求

硬件：华为 Ascend 910 系列 NPU
CANN：8.0.RC1 或更高版本
PyTorch：2.0+ 并带有 torch_npu
Docker：容器名称 test-modelagent
transformers：4.8+

目录结构

opus-mt-fr-ar-ascend/
├── inference.py              # 推理测试脚本
├── log.txt                  # 测试日志
├── README.md                # 本文档
├── test_sentences.txt       # 测试句子
├── precision_result.json    # 精度测试结果
└── inference_result.json    # 推理输出结果

部署步骤

1. 进入容器

docker exec -it test-modelagent bash

2. 设置环境变量

source /usr/local/Ascend/ascend-toolkit/set_env.sh

3. 准备模型文件

模型文件位于 /data/ysws/agentsp/5-20/opus-mt-fr-ar/ 目录下：

pytorch_model.bin - 模型权重 (约 304MB)
config.json - 模型配置
vocab.json - 词表文件
source.spm / target.spm - SentencePiece 模型
tokenizer_config.json - 分词器配置

4. 安装依赖

pip install transformers torch_npu sacremoses

使用方式

方式一：普通推理模式

运行翻译推理（仅NPU）：

cd /data/ysws/agentsp/5-20/opus-mt-fr-ar-ascend/

python3 inference.py

方式二：精度测试模式 (CPU vs NPU)

运行精度对比测试，验证 NPU 翻译结果与 CPU 一致性：

cd /data/ysws/agentsp/5-20/opus-mt-fr-ar-ascend/

python3 inference.py precision_test

测试验证

精度测试结果

指标	实测值	阈值	状态
翻译结果一致性	完全一致	相等	PASS
CPU 推理时间 (3句)	5.69s	-	-
NPU 推理时间 (3句)	1.30s	-	-
加速比	4.37x	> 1x	PASS

翻译结果示例

输入 (法语)	输出 (阿拉伯语)
Bonjour, comment allez-vous?	مرحباً، كيف حالك؟
Je suis très heureux de vous rencontrer.	أنا سعيد جداً لمقابلتك.
La vie est belle quand on sait l'apprécier.	الحياة جميلة عندما تُقدرها.

结果: CPU 和 NPU 输出的翻译结果完全一致，翻译质量正常

测试日志

============================================================
opus-mt-fr-ar - Ascend NPU Translation Test
Output: /data/ysws/agentsp/5-20/opus-mt-fr-ar-ascend
============================================================

Mode: PRECISION TEST
NPU available: True
Device: npu:0

============================================================
Loading Model and Tokenizer
============================================================
Tokenizer loaded successfully
Model loaded successfully

============================================================
Running CPU Translation
============================================================
Input:  Bonjour, comment allez-vous?
Output: مرحباً، كيف حالك؟
Input:  Je suis très heureux de vous rencontrer.
Output: أنا سعيد جداً لمقابلتك.
Input:  La vie est belle quand on sait l'apprécier.
Output: الحياة جميلة عندما تُقدرها.

CPU total time: 5.6900s

============================================================
Running NPU Translation
============================================================
Input:  Bonjour, comment allez-vous?
Output: مرحباً، كيف حالك؟
Input:  Je suis très heureux de vous rencontrer.
Output: أنا سعيد جداً لمقابلتك.
Input:  La vie est belle quand on sait l'apprécier.
Output: الحياة جميلة عندما تُقدرها.

NPU total time: 1.3019s
Speedup: 4.37x

============================================================
Precision Test Results
============================================================

Translation outputs match: PASS

============================================================
Test Complete!
============================================================

Python API 使用示例

基本翻译

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

MODEL_DIR = "/data/ysws/agentsp/5-20/opus-mt-fr-ar"

tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_DIR)

device = torch.device("npu:0")
model = model.to(device)
model.eval()

sentences = ["Bonjour, comment allez-vous?"]
inputs = tokenizer(sentences, return_tensors="pt", padding=True)
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], max_new_tokens=100)

translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Translation: {translation}")

批量翻译

sentences = [
    "Bonjour, comment allez-vous?",
    "Je suis très heureux de vous rencontrer.",
    "La vie est belle quand on sait l'apprécier."
]

inputs = tokenizer(sentences, return_tensors="pt", padding=True)
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], max_new_tokens=100)

translations = [tokenizer.decode(out, skip_special_tokens=True) for out in outputs]
for src, tgt in zip(sentences, translations):
    print(f"{src} -> {tgt}")

模型结构

架构类型: MarianMT（Transformer 编码器-解码器）
编码器: 6 层 Transformer
解码器: 6 层 Transformer
隐藏层维度: 512
注意力头数: 8
前馈网络维度: 2048
词表大小: 61153

组件	说明
encoder	6 层 Transformer 编码器
decoder	6 层 Transformer 解码器
lm_head	语言模型头部

推理参数配置

从 config.json 提取的关键参数:

{
  "d_model": 512,
  "encoder_layers": 6,
  "decoder_layers": 6,
  "encoder_attention_heads": 8,
  "decoder_attention_heads": 8,
  "encoder_ffn_dim": 2048,
  "decoder_ffn_dim": 2048,
  "vocab_size": 61153,
  "max_position_embeddings": 512,
  "scale_embedding": true
}

常见问题

Q: 翻译结果为空或包含特殊符号?

A: 检查分词器是否正确加载，确保使用 AutoTokenizer.from_pretrained() 而非手动加载。

Q: 推理速度慢?

A: NPU 推理已针对大规模矩阵运算优化，当前加速比约 4.4x。首次推理会有编译开销，后续推理会更快。

Q: 如何调整翻译长度?

A: 修改 max_new_tokens 参数，默认值为 100。

Q: 支持哪些语言对?

A: 本模型仅支持法语 (fr) 到阿拉伯语 (ar) 的翻译。其他语言对需要使用对应的模型。

参考链接

原始模型: https://huggingface.co/Helsinki-NLP/opus-mt-fr-ar
OPUS-MT: https://github.com/Helsinki-NLP/OPUS-MT
MarianMT: https://arxiv.org/abs/1804.05519

许可证

本项目遵循 Apache-2.0 许可证