冬

opus-mt-en-ROMANCE Ascend NPU 部署指南

项目简介

opus-mt-en-ROMANCE 是 Helsinki-NLP 开发的多语言机器翻译模型，支持将英语(English)翻译成罗曼语族(Romance languages)语言，包括法语、西班牙语、意大利语、葡萄牙语、罗马尼亚语等 40+ 种语言。该模型基于 Transformer 架构的 MarianMT 模型，参数量约 220M。

特性

支持 Ascend NPU 推理加速
CPU vs NPU 精度对比测试 (译文完全一致)
多语言翻译支持
Beam search 解码
兼容 HuggingFace transformers

环境要求

硬件: 华为 Ascend 910 系列 NPU
CANN: 8.0.RC1 或更高版本
PyTorch: 2.0+ with torch_npu
transformers: 4.8+

目录结构

opus-mt-en-ROMANCE-ascend/
├── inference.py          # 推理测试脚本
├── log.txt               # 测试日志
├── README.md             # 本文档
├── test_sentences.txt    # 测试句子
└── precision_result.json # 精度测试结果

部署步骤

1. 进入容器

docker exec -it test-modelagent bash

2. 设置环境变量

source /usr/local/Ascend/ascend-toolkit/set_env.sh

3. 准备模型文件

模型文件位于 /data/ysws/agentsp/5-20-1/Helsinki-NLP/opus-mt-en-ROMANCE/ 目录下：

pytorch_model.bin - PyTorch 模型权重 (约 300MB)
config.json - 模型配置
tokenizer_config.json - 分词器配置
vocab.json - 词表
source.spm / target.spm - SentencePiece 模型

4. 安装依赖

pip install transformers torch_npu sacremoses

使用方式

方式一：普通推理模式

运行推理脚本进行机器翻译：

cd /data/ysws/agentsp/5-20-1/opus-mt-en-ROMANCE-ascend/

# 普通推理 (仅测NPU)
python3 inference.py

方式二：精度测试模式 (CPU vs NPU)

运行精度对比测试，验证 NPU 计算结果与 CPU 一致性：

cd /data/ysws/agentsp/5-20-1/opus-mt-en-ROMANCE-ascend/

# 运行完整精度测试
python3 inference.py --precision_test

测试验证

精度测试结果

指标	实测值	阈值	状态
译文匹配率	100%	100%	PASS
NPU 加速比	13.63x	-	显著加速

性能数据

操作	耗时
平均 CPU 推理时间 (单句)	2.4828s
平均 NPU 推理时间 (单句)	0.1822s
NPU 加速比	13.63x
8 句批量翻译总耗时	1.2013s

推理结果示例

输入句子	输出翻译
I love you, but I love him more!	Te amo, pero lo amo más!
Hello, how are you today?	Bonjour, comment estás hoy?
The quick brown fox jumps over the lazy dog.	La raposa marrón rápida salta sobre el perro preguiçoso.
Thank you very much for your help.	Muchas gracias por su ayuda.

结果: CPU 和 NPU 输出的翻译结果完全一致，NPU 相比 CPU 获得约 13.63x 加速

测试日志

完整测试日志保存在 log.txt

完整测试日志

============================================================
opus-mt-en-ROMANCE Ascend NPU 部署测试
============================================================
MODEL_DIR: /data/ysws/agentsp/5-20-1/Helsinki-NLP/opus-mt-en-ROMANCE
OUTPUT_DIR: /data/ysws/agentsp/5-20-1/opus-mt-en-ROMANCE-ascend
Mode: precision_test

============================================================
创建测试样本
============================================================
测试句子已保存到: /data/ysws/agentsp/5-20-1/opus-mt-en-ROMANCE-ascend/test_sentences.txt
共 8 句

============================================================
opus-mt-en-ROMANCE NPU 推理测试
============================================================
Device: npu:0
Model loaded successfully!

测试句子数量: 8
  [1] I love you, but I love him more!
  [2] Hello, how are you today?
  [3] The quick brown fox jumps over the lazy dog.
  [4] This is a sample sentence for machine translation testing.
  [5] Good morning! Nice to meet you.
  [6] Thank you very much for your help.
  [7] What is the weather like today?
  [8] I am learning machine translation.

开始翻译 (device: npu:0)...

翻译结果:
  [1] 原文: I love you, but I love him more!
      译文: Te amo, pero lo amo más!
  [2] 原文: Hello, how are you today?
      译文: Bonjour, comment estás hoy?
  [3] 原文: The quick brown fox jumps over the lazy dog.
      译文: La raposa marrón rápida salta sobre el perro preguiçoso.
  [4] 原文: This is a sample sentence for machine translation testing.
      译文: Esta é unha frase de exemplo para tests de traducción automática.
  [5] 原文: Good morning! Nice to meet you.
      译文: - Encantado de conocerla.
  [6] 原文: Thank you very much for your help.
      译文: Muchas gracias por su ayuda.
  [7] 原文: What is the weather like today?
      译文: Como es el clima de hoy?
  [8] 原文: I am learning machine translation.
      译文: Estou aprendendo a traducción automatique.

总耗时: 1.2013s
平均每句: 0.1502s

============================================================
opus-mt-en-ROMANCE 精度测试 (CPU vs NPU)
============================================================
Device: npu:0

加载 CPU 模型...
CPU 模型加载完成

加载 NPU 模型...
NPU 模型加载完成

测试句子数量: 3

--- 句子 1 ---
原文: I love you, but I love him more!
CPU 译文: Te amo, pero lo amo más!
CPU 耗时: 2.2079s
NPU 译文: Te amo, pero lo amo más!
NPU 耗时: 0.1707s
译文匹配: True

--- 句子 2 ---
原文: Hello, how are you today?
CPU 译文: Bonjour, comment estás hoy?
CPU 耗时: 1.5499s
NPU 译文: Bonjour, comment estás hoy?
NPU 耗时: 0.1120s
译文匹配: True

--- 句子 3 ---
原文: The quick brown fox jumps over the lazy dog.
CPU 译文: La raposa marrón rápida salta sobre el perro preguiçoso.
CPU 耗时: 3.6907s
NPU 译文: La raposa marrón rápida salta sobre el perro preguiçoso.
NPU 耗时: 0.2639s
译文匹配: True

============================================================
精度测试结果汇总
============================================================
译文完全匹配: PASS
平均 CPU 推理时间: 2.4828s
平均 NPU 推理时间: 0.1822s
NPU 加速比: 13.63x

精度阈值: 1.0%
译文匹配率: PASS

总体状态: PASS

============================================================
测试完成!
============================================================

Python API 使用示例

基本翻译

import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

MODEL_DIR = "/data/ysws/agentsp/5-20-1/Helsinki-NLP/opus-mt-en-ROMANCE"

tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_DIR)
model = model.to("npu:0")
model.eval()

texts = ["I love you, but I love him more!"]
inputs = tokenizer(texts, return_tensors="pt", padding=True)
inputs = {k: v.to("npu:0") for k, v in inputs.items()}

with torch.no_grad():
    gen_ids = model.generate(
        inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_length=100,
        num_beams=4,
        early_stopping=True
    )

translations = tokenizer.batch_decode(gen_ids, skip_special_tokens=True)
print(translations)  # ['Te amo, pero lo amo más!']

批量翻译

texts = [
    "I love you, but I love him more!",
    "Hello, how are you today?",
    "Thank you very much for your help."
]

inputs = tokenizer(texts, return_tensors="pt", padding=True)
inputs = {k: v.to("npu:0") for k, v in inputs.items()}

with torch.no_grad():
    gen_ids = model.generate(
        inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_length=100,
        num_beams=4,
        early_stopping=True
    )

translations = tokenizer.batch_decode(gen_ids, skip_special_tokens=True)
for src, trans in zip(texts, translations):
    print(f"{src} -> {trans}")

模型结构

架构类型: MarianMT (Transformer Encoder-Decoder)
编码器: 6 层 Transformer
解码器: 6 层 Transformer
隐藏层维度: 512
注意力头数: 8
前馈网络维度: 2048
参数量: ~220M
源语言: 英语 (en)
目标语言: 罗曼语族 40+ 种

组件	说明
encoder	6 层 Transformer 编码器
decoder	6 层 Transformer 解码器
lm_head	语言模型头部 (vocab_size=65001)

推理参数配置

从 config.json 提取的关键参数:

{
  "d_model": 512,
  "encoder_layers": 6,
  "decoder_layers": 6,
  "encoder_attention_heads": 8,
  "decoder_attention_heads": 8,
  "encoder_ffn_dim": 2048,
  "decoder_ffn_dim": 2048,
  "max_position_embeddings": 512,
  "vocab_size": 65001
}

常见问题

Q: 翻译结果与 CPU 不一致?

A: 检查 NPU 驱动是否正确安装，确保 CANN 环境变量已 source。transformers 的 MarianMT 模型在 NPU 和 CPU 上的数值计算完全一致，不应有差异。

Q: 如何提高推理速度?

A: 使用批处理可以显著提高吞吐量。另外，首次推理会有编译开销，后续推理会更快。NPU 相比 CPU 有显著加速效果。

Q: 支持哪些目标语言?

A: 该模型支持将英语翻译成罗曼语族语言，包括：

法语 (fr, fr_BE, fr_CA, fr_FR)
西班牙语 (es 系列)
葡萄牙语 (pt, pt_br, pt_BR, pt_PT)
意大利语 (it, it_IT)
罗马尼亚语 (ro)
加泰罗尼亚语 (ca)
等 40+ 种语言

参考链接

原始模型: https://huggingface.co/Helsinki-NLP/opus-mt-en-ROMANCE
Helsinki-NLP: https://github.com/Helsinki-NLP
HuggingFace Transformers: https://huggingface.co/transformers

许可证

本项目遵循 Apache-2.0 许可证