mxbai-rerank-xsmall-v1 是 Mixedbread AI 开发的文档重排序 (Reranker) 模型,基于 DebertaV2 架构。该模型能够对检索到的文档进行相关性排序,提高搜索和 RAG 系统的准确性。它是小尺寸版本,适用于资源受限的环境。
mxbai-rerank-xsmall-v1-ascend/
├── inference.py # 推理测试脚本
├── log.txt # 测试日志
├── README.md # 本文档
├── test_sample.txt # 测试样例
├── inference_result.json # 推理结果
└── precision_result.json # 精度测试结果docker exec -it test-modelagent bashsource /usr/local/Ascend/ascend-toolkit/set_env.sh模型文件位于 /data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1/mixedbread-ai/mxbai-rerank-xsmall-v1/ 目录下:
pip install transformers torch_npu -i https://pypi.huaweicloud.com/repository/pypi/simple/Run the inference script for document reranking:
cd /data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1-ascend/
python3 inference.py --mode inference运行精度对比测试,验证 NPU 计算结果与 CPU 一致性:
cd /data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1-ascend/
python3 inference.py --mode precision_testcd /data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1-ascend/
python3 inference.py --mode all| 参数 | 说明 | 默认值 |
|---|---|---|
--mode | 测试模式: inference, precision_test 或 all | all |
| 指标 | 实测值 | 阈值 | 状态 |
|---|---|---|---|
| 最大相对误差 | 0.1898% | < 1.00% | PASS |
| 最大绝对误差 | 7.81e-03 | - | - |
| CPU 推理时间 | 1.495s | - | - |
| NPU 推理时间 | 0.041s | - | - |
| 加速比 | 36.86x | > 1x | PASS |
| 分数一致性 | 完全一致 | - | PASS |
| 操作 | 耗时 |
|---|---|
| NPU 推理时间 (3 文档) | 0.656s |
| 精度测试 CPU 时间 | 1.495s |
| 精度测试 NPU 时间 | 0.041s |
查询: "Who wrote 'To Kill a Mockingbird'?"
| 排名 | 相关性分数 | 文档摘要 |
|---|---|---|
| 1 | 0.9946 | 'To Kill a Mockingbird' is a novel by Harper Lee... |
| 2 | 0.9839 | Harper Lee, an American novelist widely known... |
| 3 | 0.5010 | The novel 'Moby-Dick' was written by Herman Melville... |
结果: 模型正确识别 Harper Lee 是《杀死一只知更鸟》的作者,相关性分数最高。
============================================================
mxbai-rerank-xsmall-v1 NPU Test
Model: mixedbread-ai/mxbai-rerank-xsmall-v1
Output: /data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1-ascend
============================================================
============================================================
mxbai-rerank-xsmall-v1 Inference Test (NPU)
============================================================
Device: npu:0
Model: /data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1/mixedbread-ai/mxbai-rerank-xsmall-v1
Loading tokenizer...
Loading model...
Loading weights: 100%|██████████| 202/202 [00:00<00:00, 5362.81it/s]
Model loaded successfully
Query: Who wrote 'To Kill a Mockingbird'?
Documents: 3
Input shape: torch.Size([3, 48])
Logits shape: torch.Size([3, 1])
Scores: [0.99462890625, 0.5009765625, 0.98388671875]
Inference time: 0.656s
Reranked results:
1. [score=0.9946] 'To Kill a Mockingbird' is a novel by Harper Lee published i...
2. [score=0.9839] Harper Lee, an American novelist widely known for her novel ...
3. [score=0.5010] The novel 'Moby-Dick' was written by Herman Melville and fir...
Inference result saved to /data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1-ascend/inference_result.json
============================================================
Precision Test (CPU vs NPU)
============================================================
Using device: npu:0
Loading tokenizer...
Loading model on CPU...
Loading weights: 100%|██████████| 202/202 [00:00<00:00, 4532.44it/s]
Loading model on npu:0...
Loading weights: 100%|██████████| 202/202 [00:00<00:00, 4531.76it/s]
Running inference on CPU...
Running inference on NPU...
CPU inference time: 1.495s
NPU inference time: 0.041s
Speedup: 36.86x
Max absolute error: 7.812500e-03
Max relative error: 0.1898% (threshold: 1.0%)
CPU score: 0.983887
NPU score: 0.983887
Scores match (atol=1e-4): True
Status: PASS
Precision result saved to /data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1-ascend/precision_result.json
============================================================
Creating Test Sample
============================================================
Saved test sample: /data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1-ascend/test_sample.txt
============================================================
Test Complete!
============================================================import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
MODEL_DIR = "/data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1/mixedbread-ai/mxbai-rerank-xsmall-v1"
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_DIR)
model = model.to("npu:0").eval()
query = "Who wrote 'To Kill a Mockingbird'?"
documents = [
"'To Kill a Mockingbird' is a novel by Harper Lee published in 1960.",
"The novel 'Moby-Dick' was written by Herman Melville.",
"Harper Lee wrote 'To Kill a Mockingbird' and was born in 1926."
]
pairs = [[query, doc] for doc in documents]
inputs = tokenizer(pairs, return_tensors="pt", padding=True, truncation=True, max_length=512)
inputs = {k: v.to("npu:0") for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
scores = outputs.logits.squeeze(-1).sigmoid()
sorted_indices = torch.argsort(scores, descending=True).tolist()
for rank, idx in enumerate(sorted_indices, 1):
print(f"{rank}. {documents[idx]} (score: {scores[idx].item():.4f})")def rerank_documents(query, retrieved_docs, top_k=3):
pairs = [[query, doc] for doc in retrieved_docs]
inputs = tokenizer(pairs, return_tensors="pt", padding=True, truncation=True, max_length=512)
inputs = {k: v.to("npu:0") for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
scores = outputs.logits.squeeze(-1).sigmoid()
sorted_indices = torch.argsort(scores, descending=True)[:top_k].tolist()
return [(retrieved_docs[i], scores[i].item()) for i in sorted_indices]| 组件 | 说明 |
|---|---|
| embeddings | DebertaV2 词嵌入 |
| encoder | 12 层 Transformer 编码器 |
| pooler | 池化层输出分类logits |
| classifier | 序列分类头 (输出相关性分数) |
从 config.json 提取的关键参数:
{
"model_type": "deberta-v2",
"hidden_size": 384,
"num_hidden_layers": 12,
"num_attention_heads": 6,
"intermediate_size": 1536,
"vocab_size": 128100,
"max_position_embeddings": 512,
"attention_probs_dropout_prob": 0.1,
"hidden_dropout_prob": 0.1
}A: 检查 NPU 驱动是否正确安装。DebertaV2 模型在 CPU 和 NPU 上的输出几乎完全一致,误差极小 (0.19%)。
A: 使用批处理可以显著提高吞吐量。NPU 推理非常快 (0.041s vs CPU 1.495s)。
A: 本模型针对英语文档重排序。如需多语言支持,请访问 Mixedbread AI 查找其他模型。
本项目遵循 Apache-2.0 许可证