本文档记录 LanceFerrari/lance_finbert_1 在昇腾 NPU(Ascend910)环境的快速部署与验证结果。
BERT-base 文本编码模型(hidden_size=768, 12 layers, 12 heads),基于 HuggingFace transformers 框架,支持一键加载推理。
相关获取地址:
参考文档:
| 组件 | 版本 |
|---|---|
torch | 2.1.0 |
torch_npu | 2.1.0 |
transformers | >=4.37.0 |
CANN | 8.5.RC1 |
pip install transformers torchimport torch
from transformers import AutoTokenizer, AutoModel
device = torch.device("npu:0" if torch.npu.is_available() else "cpu")
model_name = "LanceFerrari/lance_finbert_1"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
model = model.to(device).eval()
texts = ["今天天气真好", "这个产品太差了"]
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=128)
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state[:, 0, :]
print(f"嵌入维度: {embeddings.shape}")
print(f"嵌入向量前5维: {embeddings[0][:5].tolist()}")NPU 与 CPU logits 数值一致性对比:
| 指标 | 值 |
|---|---|
| Top-1 一致性 | 4/4 |
| Max Logit Diff Ratio | 0.000465 |
| Avg KL Divergence | 1e-06 |
| 结论 | PASS |
| 指标 | 值 |
|---|---|
| 硬件 | Ascend 910B |
| 平均推理时间 | 6.95 ms |
| 测试条件 | batch=8, max_length=128, fp32 |
| runs | 50 |
network.bert. 前缀,加载时需去除前缀后再用 BertModel 加载BertConfig + BertModel 手动加载权重