冬
gcw_IDzXRVNw/opus-mt-en-mg-ascend
模型介绍文件和版本Pull Requests讨论分析
下载使用量0

OPUS-MT-EN-MG Ascend NPU 部署指南

项目简介

OPUS-MT-EN-MG 是 Helsinki-NLP 的英语到马达加斯加语机器翻译模型 (MarianMT),基于 Transformer 架构,支持高质量的 EN→MG 翻译任务。

特性

  • 支持 Ascend NPU 推理加速
  • CPU 与 NPU 精度对比测试(输出完全一致)
  • 高效神经机器翻译
  • 兼容 HuggingFace transformers

环境要求

  • 硬件:华为 Ascend 910 系列 NPU
  • CANN:8.0.RC1 或更高版本
  • PyTorch:2.0+ 并带有 torch_npu
  • Docker:容器名称 test-modelagent
  • transformers:4.8+

目录结构

opus-mt-en-mg-ascend/
├── inference.py          # 推理测试脚本
├── log.txt               # 测试日志
├── README.md             # 本文档
├── test_sample.txt       # 测试样本
├── inference_result.json # 推理结果
└── precision_result.json # 精度测试结果

部署步骤

1. 进入容器

docker exec -it test-modelagent bash

2. 设置环境变量

source /usr/local/Ascend/ascend-toolkit/set_env.sh

3. 准备模型文件

模型文件位于 /data/ysws/agentsp/5-18-2/opus-mt-en-mg/Helsinki-NLP/opus-mt-en-mg/ 目录下:

  • pytorch_model.bin - 模型权重(约295MB)
  • config.json - 模型配置
  • source.spm / target.spm - SentencePiece 模型
  • vocab.json - 词汇表

4. 安装依赖

pip install transformers torch_npu

Usage

Method 1: Normal Inference Mode

cd /data/ysws/agentsp/5-18-2/opus-mt-en-mg-ascend/
python3 inference.py

方式二:精度测试模式 (CPU vs NPU)

cd /data/ysws/agentsp/5-18-2/opus-mt-en-mg-ascend/
python3 inference.py precision_test

测试验证

精度测试结果

指标实测值阈值状态
输出匹配True100%PASS
NPU 加速比11.17x> 10xPASS

性能数据

操作耗时
CPU 推理时间2.020s
NPU 推理时间0.181s
加速比11.17x

翻译结果示例

输入 (EN)输出 (MG)
"Hello, how are you today?""Miarahaba, ahoana ny aminao amin'izao fotoana izao?"
"I am very happy to see you.""Very fotsy ny fahazotoana hitananao."
"The weather is nice today.""Tsara ny teny anio."

结果: CPU 和 NPU 输出完全一致,翻译质量良好

测试日志

完整测试日志如下:

============================================================
OPUS-MT-EN-MG NPU Test
Output: /data/ysws/agentsp/5-18-2/opus-mt-en-mg-ascend
============================================================

============================================================
OPUS-MT-EN-MG Inference Test (NPU)
============================================================
Device: npu:0
Model: /data/ysws/agentsp/5-18-2/opus-mt-en-mg/Helsinki-NLP/opus-mt-en-mg

Loading tokenizer...
Loading model...
Loading weights: 100%|██████████| 258/258 [00:00<00:00, 12617.83it/s]
[transformers] Both `max_new_tokens` (=50) and `max_length`(=512) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information: (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)

Input text: ['Hello, how are you today?']
Input shape: torch.Size([1, 8])
Generated text: ["Miarahaba, ahoana ny aminao amin'izao fotoana izao?"]
Inference time: 1.577s

============================================================
Precision Test (CPU vs NPU)
============================================================

Loading model on CPU...
Loading weights: 100%|██████████| 258/258 [00:00<00:00, 12563.92it/s]
[transformers] Both `max_new_tokens` (=50) and `max_length`(=512) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information: (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Running inference on CPU...

Loading model on NPU...
Loading weights: 100%|██████████| 258/258 [00:00<00:00, 12675.77it/s]
[transformers] Both `max_new_tokens` (=50) and `max_length`(=512) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information: (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Running inference on NPU...

CPU inference time: 2.020s
NPU inference time: 0.181s
Speedup: 11.17x
CPU output: ["Miarahaba, ahoana ny aminao amin'izao fotoana izao?"]
NPU output: ["Miarahaba, ahoana ny aminao amin'izao fotoana izao?"]
Output texts match: True
Status: PASS

============================================================
Creating Test Sample
============================================================
Saved test sample
  1. Hello, how are you today?
  2. I am very happy to see you.
  3. The weather is nice today.

============================================================
Test Complete!

Python API 使用示例

import torch
from transformers import MarianTokenizer, MarianMTModel

MODEL_DIR = "/data/ysws/agentsp/5-18-2/opus-mt-en-mg/Helsinki-NLP/opus-mt-en-mg"

tokenizer = MarianTokenizer.from_pretrained(MODEL_DIR)
model = MarianMTModel.from_pretrained(MODEL_DIR)
model = model.to("npu:0")
model.eval()

src_texts = ["Hello, how are you today?"]
inputs = tokenizer(src_texts, return_tensors="pt", padding=True)
inputs = {k: v.to("npu:0") for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(inputs['input_ids'], max_new_tokens=50)

translations = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(translations)  # ["Miarahaba, ahoana ny aminao amin'izao fotoana izao?"]

模型结构

  • 架构类型: Marian (Transformer)
  • 编码器: 6 层 Transformer
  • 解码器: 6 层 Transformer
  • 隐藏层维度: 768
  • 注意力头数: 12
  • 语言方向: 英语 → 马尔加什语

常见问题

问:精度测试失败?

答:检查 NPU 驱动是否正确安装,确保 CANN 环境变量已执行 source 命令。

参考链接

  • 原始模型: https://huggingface.co/Helsinki-NLP/opus-mt-en-mg
  • MarianMT: https://huggingface.co/docs/transformers/main_classes/models#marianmt

许可证

本项目遵循 Apache-2.0 许可证