本文档记录 AI-ModelScope_stella-large-zh-v2 在昇腾 NPU(Ascend910)环境的快速部署与验证结果。该模型为文本嵌入模型,基于 HuggingFace transformers 框架构建。
相关获取地址:
pip install transformers torchimport torch
from transformers import AutoTokenizer, AutoModel
device = torch.device("npu:0" if torch.npu.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("AI-ModelScope_stella-large-zh-v2", trust_remote_code=True)
model = AutoModel.from_pretrained("AI-ModelScope_stella-large-zh-v2", trust_remote_code=True)
model = model.to(device).eval()
texts = ["Hello world"]
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=128).to(device)
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state[:, 0, :]
print(f"嵌入维度: {embeddings.shape}")NPU 与 CPU logits 数值一致性对比:
| 指标 | 值 |
|---|---|
| Top-1 一致性 | 4/4 |
| Max Logit Diff Ratio | 0.002291 |
| Avg KL Divergence | 0.0 |
| 结论 | PASS |
| 指标 | 值 |
|---|---|
| 硬件 | Ascend 910B |
| 平均推理时间 | 6.95 ms |
| 测试条件 | batch=8, max_length=128 |
精度验证为 NPU 与 CPU 数值一致性(logits 相对误差 < 1%),非模型准确率。