Lajavaness_bilingual-embedding-small
1. 简介
本文档记录 Lajavaness/bilingual-embedding-small 在昇腾 NPU(Ascend910)环境的快速部署与验证结果。
相关获取地址:
2. 验证环境
| 组件 | 版本 |
|---|
| torch | ≥2.1.0 |
| torch_npu | ≥2.1.0 |
| transformers | ≥4.37.0 |
| CANN | 8.5.RC1 |
3. 快速部署
import torch
from transformers import AutoTokenizer, AutoModel
import torch.nn.functional as F
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output.last_hidden_state
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
device = torch.device("npu:0")
model_name = "Lajavaness/bilingual-embedding-small"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True).to(device).eval()
texts = ["Hello world"]
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=128)
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
embedding = F.normalize(mean_pooling(outputs, inputs["attention_mask"]), p=2, dim=-1)
print(f"嵌入维度: {embedding.shape}")
4. 精度评测
| 指标 | 值 |
|---|
| Max Logit Diff Ratio | 1e-06 |
| Min Cosine Similarity | 0.99999928 |
| 结论 | PASS |
5. 性能参考
| 指标 | 值 |
|---|
| 硬件 | Ascend 910B |
| 平均推理时间 | 7.62ms |