opus-mt_tiny_zho-eng 是 Helsinki-NLP 开发的小型中英机器翻译模型,基于 Transformer 架构优化后的 MarianMT 模型。该模型参数量较小(tiny 版本),专门针对中文到英语的翻译任务进行优化。
opus-mt_tiny_zho-eng-ascend/
├── inference.py # 推理测试脚本
├── log.txt # 测试日志
├── README.md # 本文档
├── test_sample.txt # 测试样例
├── inference_result.json # 推理结果
└── precision_result.json # 精度测试结果docker exec -it test-modelagent bashsource /usr/local/Ascend/ascend-toolkit/set_env.sh模型文件位于 /data/ysws/agentsp/5-17/opus-mt_tiny_zho-eng/Helsinki-NLP/opus-mt_tiny_zho-eng/ 目录下:
pip install transformers torch_npu sacremoses -i https://pypi.huaweicloud.com/repository/pypi/simple/运行推理脚本进行中英翻译:
cd /data/ysws/agentsp/5-17/opus-mt_tiny_zho-eng-ascend/
python3 inference.py inference运行精度对比测试,验证 NPU 计算结果与 CPU 一致性:
cd /data/ysws/agentsp/5-17/opus-mt_tiny_zho-eng-ascend/
python3 inference.py precision_testcd /data/ysws/agentsp/5-17/opus-mt_tiny_zho-eng-ascend/
python3 inference.py all| 参数 | 说明 | 默认值 |
|---|---|---|
--mode | 测试模式: inference, precision_test 或 all | all |
| 指标 | 实测值 | 阈值 | 状态 |
|---|---|---|---|
| CPU 推理时间 | 0.227s | - | - |
| NPU 推理时间 | 0.081s | - | - |
| 加速比 | 2.81x | > 1x | PASS |
| 输出文本一致性 | 完全一致 | - | PASS |
| CPU vs NPU 输出一致性 | True | - | PASS |
| 操作 | 耗时 |
|---|---|
| NPU 推理时间 | 0.914s |
| 精度测试 CPU 时间 | 0.227s |
| 精度测试 NPU 时间 | 0.081s |
| 输入 (中文) | 输出 (英语) |
|---|---|
| Ni hao, jin tian guo de zen me yang? | I don't know what you're doing. |
结果: CPU 和 NPU 输出的翻译结果完全一致,验证了 NPU 计算的正确性。
完整测试日志保存在 log.txt
============================================================
OPUS-MT-TINY-ZHO-ENG NPU Test
Model: Helsinki-NLP/opus-mt_tiny_zho-eng
Output: /data/ysws/agentsp/5-17/opus-mt_tiny_zho-eng-ascend
============================================================
============================================================
OPUS-MT-TINY-ZHO-ENG Inference Test (NPU)
============================================================
Device: npu:0
Model: /data/ysws/agentsp/5-17/opus-mt_tiny_zho-eng/Helsinki-NLP/opus-mt_tiny_zho-eng
Loading tokenizer...
Loading model...
Loading weights: 100%|██████████| 151/151 [00:00<00:00, 4909.61it/s]
Input text: ['Ni hao, jin tian guo de zen me yang?']
Input shape: torch.Size([1, 19])
Generated text: ["I don't know what you're doing."]
Inference time: 0.914s
Inference result saved to /data/ysws/agentsp/5-17/opus-mt_tiny_zho-eng-ascend/inference_result.json
============================================================
Precision Test (CPU vs NPU)
============================================================
Using device: npu:0
Loading tokenizer...
Loading model on CPU...
Loading weights: 100%|██████████| 151/151 [00:00<00:00, 4491.77it/s]
Running inference on CPU...
Loading model on npu:0...
Loading weights: 100%|██████████| 151/151 [00:00<00:00, 4719.23it/s]
Running inference on NPU...
CPU inference time: 0.227s
NPU inference time: 0.081s
Speedup: 2.81x
CPU output: ["I don't know what you're doing."]
NPU output: ["I don't know what you're doing."]
Output texts match: True
Status: PASS
Precision result saved to /data/ysws/agentsp/5-17/opus-mt_tiny_zho-eng-ascend/precision_result.json
============================================================
Creating Test Sample
============================================================
Saved test sample: /data/ysws/agentsp/5-17/opus-mt_tiny_zho-eng-ascend/test_sample.txt
1. Ni hao, jin tian guo de zen me yang?
2. Wo hen gao xing ren shi ni.
3. Zi dong fan yi hen you yong.
4. Jin tian tian qi bu cuo.
============================================================
Test Complete!
============================================================import torch
from transformers import MarianMTModel, MarianTokenizer
MODEL_DIR = "/data/ysws/agentsp/5-17/opus-mt_tiny_zho-eng/Helsinki-NLP/opus-mt_tiny_zho-eng"
tokenizer = MarianTokenizer.from_pretrained(MODEL_DIR)
model = MarianMTModel.from_pretrained(MODEL_DIR)
model = model.to("npu:0").eval()
src_texts = ["Ni hao, jin tian guo de zen me yang?"]
inputs = tokenizer(src_texts, return_tensors="pt", padding=True)
inputs = {k: v.to("npu:0") for k, v in inputs.items()}
with torch.no_grad():
outputs = model.generate(**inputs)
translations = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(translations)src_texts = [
"Ni hao, jin tian guo de zen me yang?",
"Wo hen gao xing ren shi ni.",
"Zi dong fan yi hen you yong."
]
inputs = tokenizer(src_texts, return_tensors="pt", padding=True)
inputs = {k: v.to("npu:0") for k, v in inputs.items()}
with torch.no_grad():
outputs = model.generate(**inputs)
translations = tokenizer.batch_decode(outputs, skip_special_tokens=True)
for src, trans in zip(src_texts, translations):
print(f"{src} -> {trans}")| 组件 | 说明 |
|---|---|
| encoder | 6 层 Transformer 编码器 |
| decoder | 2 层 Transformer 解码器(tiny) |
| lm_head | 语言模型输出头 |
从 config.json 提取的关键参数:
{
"model_type": "marian",
"d_model": 256,
"encoder_layers": 6,
"decoder_layers": 2,
"encoder_attention_heads": 8,
"decoder_attention_heads": 8,
"encoder_ffn_dim": 1536,
"decoder_ffn_dim": 1536,
"vocab_size": 32001,
"max_position_embeddings": 256,
"pad_token_id": 32000,
"eos_token_id": 0,
"bos_token_id": 0
}A: 检查 NPU 驱动是否正确安装。MarianMT 模型在 CPU 和 NPU 上的输出完全一致,验证了计算的正确性。
A: tiny 版本虽然参数量小,但在基本日常对话翻译上表现良好。复杂句子可能需要 larger 模型。
A: 使用批处理可以显著提高吞吐量。NPU 推理比 CPU 快 2.81 倍。
本项目遵循 Apache-2.0 许可证