weixin_72661020/Lajavaness_bilingual-embedding-small
模型介绍文件和版本Pull Requests讨论分析

Lajavaness_bilingual-embedding-small

1. 简介

本文档记录 Lajavaness/bilingual-embedding-small 在昇腾 NPU(Ascend910)环境的快速部署与验证结果。

相关获取地址:

  • 权重下载地址(ModelScope):https://modelscope.cn/models/Lajavaness/bilingual-embedding-small

2. 验证环境

组件版本
torch≥2.1.0
torch_npu≥2.1.0
transformers≥4.37.0
CANN8.5.RC1
  • NPU:Ascend910B(单卡)

3. 快速部署

import torch
from transformers import AutoTokenizer, AutoModel
import torch.nn.functional as F
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output.last_hidden_state
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
device = torch.device("npu:0")
model_name = "Lajavaness/bilingual-embedding-small"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True).to(device).eval()
texts = ["Hello world"]
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=128)
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
    outputs = model(**inputs)
embedding = F.normalize(mean_pooling(outputs, inputs["attention_mask"]), p=2, dim=-1)
print(f"嵌入维度: {embedding.shape}")

4. 精度评测

指标值
Max Logit Diff Ratio1e-06
Min Cosine Similarity0.99999928
结论PASS

5. 性能参考

指标值
硬件Ascend 910B
平均推理时间7.62ms
下载使用量0