本文档记录 iic/nlp_seqgpt-560m 在华为昇腾 Ascend NPU 上的适配与验证结果。
nlp_seqgpt-560m 是一个基于 BLOOM 架构的因果语言模型,参数量为 560M,由 ModelScope 社区提供。
模型获取地址:
| 组件 | 版本 |
|---|---|
| NPU | Ascend 910 (64GB HBM) |
| 驱动版本 | npu-smi 25.5.2 |
| PyTorch | 2.9.0 |
| torch_npu | 可用 |
| transformers | 4.57.6 |
| vLLM | 0.18.0 (vLLM-Ascend) |
export MODEL_DIR=/opt/atomgit/iic/nlp_seqgpt-560m/model/iic/nlp_seqgpt-560m
python3 -c "
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch, torch_npu
model_path = '$MODEL_DIR'
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16).to('npu:0')
prompt = 'The capital of France is'
inputs = tokenizer(prompt, return_tensors='pt').to('npu:0')
with torch.no_grad():
outputs = model.generate(**inputs, max_length=30)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
"vllm serve /opt/atomgit/iic/nlp_seqgpt-560m/model/iic/nlp_seqgpt-560m \
--dtype bfloat16 \
--port 8000 \
--trust-remote-code \
--max-model-len 2048 \
--gpu-memory-utilization 0.85 \
--enforce-eager \
--served-model-name seqgpt服务启动后验证:
# 检查模型列表
curl -sf http://127.0.0.1:8000/v1/models
# 发送补全请求
curl -sf http://127.0.0.1:8000/v1/completions \
-H 'Content-Type: application/json' \
-d '{"model":"seqgpt","prompt":"The capital of France is","temperature":0,"max_tokens":16}'预期结果:
/v1/models 返回 200测试条件:Ascend 910 单卡,vLLM 0.18.0,max_tokens=32,temperature=0,5 次请求取平均。
| Prompt | 平均耗时 (s) |
|---|---|
| The capital of France is | 0.079 |
| Hello, how are you | 0.077 |
| In the beginning | 0.078 |
| Once upon a time | 0.077 |
| The meaning of life is | 0.078 |
| Prompt | 补全结果 |
|---|---|
| The capital of France is | Paris |
| Hello, how are | you |
| 输入 | 预期 | 实际输出 |
|---|---|---|
| The capital of France is | Paris | Paris |
| Hello, how are | you | you |
运行精度评测:
python3 eval/accuracy_run.py运行性能评测:
python3 eval/accuracy_run_perf.pytie_word_embeddings=True 配置,vLLM 加载时需配置 --trust-remote-code。--gpu-memory-utilization 0.85 避免 OOM。--enforce-eager 参数,避免 torch.compile 兼容性问题。