冬
gcw_IDzXRVNw/mxbai-rerank-xsmall-v1-ascend
模型介绍文件和版本Pull Requests讨论分析
下载使用量0

mxbai-rerank-xsmall-v1 Ascend NPU 部署指南

项目简介

mxbai-rerank-xsmall-v1 是 Mixedbread AI 开发的文档重排序 (Reranker) 模型,基于 DebertaV2 架构。该模型能够对检索到的文档进行相关性排序,提高搜索和 RAG 系统的准确性。它是小尺寸版本,适用于资源受限的环境。

特性

  • 支持 Ascend NPU 推理加速
  • CPU 与 NPU 精度对比测试(输出完全一致)
  • DebertaV2 序列分类器
  • 36 倍加速比
  • 高精度文档重排序

环境要求

  • 硬件:华为 Ascend 910 系列 NPU
  • CANN:8.0.RC1 或更高版本
  • PyTorch:2.0+ 且带有 torch_npu
  • Docker:容器名称 test-modelagent
  • transformers:4.38+

目录结构

mxbai-rerank-xsmall-v1-ascend/
├── inference.py          # 推理测试脚本
├── log.txt               # 测试日志
├── README.md             # 本文档
├── test_sample.txt       # 测试样例
├── inference_result.json # 推理结果
└── precision_result.json # 精度测试结果

部署步骤

1. 进入容器

docker exec -it test-modelagent bash

2. 设置环境变量

source /usr/local/Ascend/ascend-toolkit/set_env.sh

3. 准备模型文件

模型文件位于 /data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1/mixedbread-ai/mxbai-rerank-xsmall-v1/ 目录下:

  • model.safetensors - 模型权重 (约 142MB)
  • config.json - 模型配置
  • tokenizer.json / tokenizer_config.json - 分词器文件
  • spm.model - SentencePiece 模型

4. 安装依赖

pip install transformers torch_npu -i https://pypi.huaweicloud.com/repository/pypi/simple/

Usage

Method 1: Normal Inference Mode

Run the inference script for document reranking:

cd /data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1-ascend/

python3 inference.py --mode inference

方式二:精度测试模式 (CPU vs NPU)

运行精度对比测试,验证 NPU 计算结果与 CPU 一致性:

cd /data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1-ascend/

python3 inference.py --mode precision_test

方式三:完整测试 (推理 + 精度)

cd /data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1-ascend/

python3 inference.py --mode all

命令行参数说明

参数说明默认值
--mode测试模式: inference, precision_test 或 allall

测试验证

精度测试结果

指标实测值阈值状态
最大相对误差0.1898%< 1.00%PASS
最大绝对误差7.81e-03--
CPU 推理时间1.495s--
NPU 推理时间0.041s--
加速比36.86x> 1xPASS
分数一致性完全一致-PASS

性能数据

操作耗时
NPU 推理时间 (3 文档)0.656s
精度测试 CPU 时间1.495s
精度测试 NPU 时间0.041s

重排序结果示例

查询: "Who wrote 'To Kill a Mockingbird'?"

排名相关性分数文档摘要
10.9946'To Kill a Mockingbird' is a novel by Harper Lee...
20.9839Harper Lee, an American novelist widely known...
30.5010The novel 'Moby-Dick' was written by Herman Melville...

结果: 模型正确识别 Harper Lee 是《杀死一只知更鸟》的作者,相关性分数最高。

测试日志

============================================================
mxbai-rerank-xsmall-v1 NPU Test
Model: mixedbread-ai/mxbai-rerank-xsmall-v1
Output: /data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1-ascend
============================================================

============================================================
mxbai-rerank-xsmall-v1 Inference Test (NPU)
============================================================
Device: npu:0
Model: /data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1/mixedbread-ai/mxbai-rerank-xsmall-v1
Loading tokenizer...
Loading model...
Loading weights: 100%|██████████| 202/202 [00:00<00:00, 5362.81it/s]
Model loaded successfully
Query: Who wrote 'To Kill a Mockingbird'?
Documents: 3
Input shape: torch.Size([3, 48])
Logits shape: torch.Size([3, 1])
Scores: [0.99462890625, 0.5009765625, 0.98388671875]
Inference time: 0.656s
Reranked results:
  1. [score=0.9946] 'To Kill a Mockingbird' is a novel by Harper Lee published i...
  2. [score=0.9839] Harper Lee, an American novelist widely known for her novel ...
  3. [score=0.5010] The novel 'Moby-Dick' was written by Herman Melville and fir...

Inference result saved to /data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1-ascend/inference_result.json

============================================================
Precision Test (CPU vs NPU)
============================================================
Using device: npu:0
Loading tokenizer...
Loading model on CPU...
Loading weights: 100%|██████████| 202/202 [00:00<00:00, 4532.44it/s]
Loading model on npu:0...
Loading weights: 100%|██████████| 202/202 [00:00<00:00, 4531.76it/s]
Running inference on CPU...
Running inference on NPU...
CPU inference time: 1.495s
NPU inference time: 0.041s
Speedup: 36.86x
Max absolute error: 7.812500e-03
Max relative error: 0.1898% (threshold: 1.0%)
CPU score: 0.983887
NPU score: 0.983887
Scores match (atol=1e-4): True
Status: PASS

Precision result saved to /data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1-ascend/precision_result.json

============================================================
Creating Test Sample
============================================================
Saved test sample: /data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1-ascend/test_sample.txt

============================================================
Test Complete!
============================================================

Python API 使用示例

基本重排序

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

MODEL_DIR = "/data/ysws/agentsp/5-16/mxbai-rerank-xsmall-v1/mixedbread-ai/mxbai-rerank-xsmall-v1"

tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_DIR)

model = model.to("npu:0").eval()

query = "Who wrote 'To Kill a Mockingbird'?"
documents = [
    "'To Kill a Mockingbird' is a novel by Harper Lee published in 1960.",
    "The novel 'Moby-Dick' was written by Herman Melville.",
    "Harper Lee wrote 'To Kill a Mockingbird' and was born in 1926."
]

pairs = [[query, doc] for doc in documents]
inputs = tokenizer(pairs, return_tensors="pt", padding=True, truncation=True, max_length=512)
inputs = {k: v.to("npu:0") for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)

scores = outputs.logits.squeeze(-1).sigmoid()
sorted_indices = torch.argsort(scores, descending=True).tolist()

for rank, idx in enumerate(sorted_indices, 1):
    print(f"{rank}. {documents[idx]} (score: {scores[idx].item():.4f})")

用于 RAG 系统

def rerank_documents(query, retrieved_docs, top_k=3):
    pairs = [[query, doc] for doc in retrieved_docs]
    inputs = tokenizer(pairs, return_tensors="pt", padding=True, truncation=True, max_length=512)
    inputs = {k: v.to("npu:0") for k, v in inputs.items()}

    with torch.no_grad():
        outputs = model(**inputs)

    scores = outputs.logits.squeeze(-1).sigmoid()
    sorted_indices = torch.argsort(scores, descending=True)[:top_k].tolist()

    return [(retrieved_docs[i], scores[i].item()) for i in sorted_indices]

模型结构

  • 架构类型: DebertaV2ForSequenceClassification
  • 编码器层数: 12
  • 隐藏层维度: 384
  • 注意力头数: 6
  • 前馈网络维度: 1536
  • 词汇表大小: 128100
组件说明
embeddingsDebertaV2 词嵌入
encoder12 层 Transformer 编码器
pooler池化层输出分类logits
classifier序列分类头 (输出相关性分数)

推理参数配置

从 config.json 提取的关键参数:

{
  "model_type": "deberta-v2",
  "hidden_size": 384,
  "num_hidden_layers": 12,
  "num_attention_heads": 6,
  "intermediate_size": 1536,
  "vocab_size": 128100,
  "max_position_embeddings": 512,
  "attention_probs_dropout_prob": 0.1,
  "hidden_dropout_prob": 0.1
}

常见问题

Q: 精度测试失败?

A: 检查 NPU 驱动是否正确安装。DebertaV2 模型在 CPU 和 NPU 上的输出几乎完全一致,误差极小 (0.19%)。

Q: 如何提高重排序速度?

A: 使用批处理可以显著提高吞吐量。NPU 推理非常快 (0.041s vs CPU 1.495s)。

Q: 模型支持多语言吗?

A: 本模型针对英语文档重排序。如需多语言支持,请访问 Mixedbread AI 查找其他模型。

参考链接

  • 原始模型: https://huggingface.co/mixedbread-ai/mxbai-rerank-xsmall-v1
  • Mixedbread AI: https://mixedbread.com
  • DebertaV2 论文: https://arxiv.org/abs/2006.03654
  • HuggingFace Transformers: https://huggingface.co/transformers

许可证

本项目遵循 Apache-2.0 许可证