cde-small-v1 Ascend NPU 部署指南

项目简介

cde-small-v1 是一个用于文本嵌入的模型，在 MTEB 基准测试中表现优异。该模型支持多种文本分类和嵌入任务，包括 Amazon 评论分类、情感分析等。

特性

支持 Ascend NPU 推理加速
CPU vs NPU 精度对比测试
多任务支持 (分类、嵌入、检索)
768 维嵌入向量输出
BERT 风格架构

环境要求

硬件: 华为 Ascend 910 系列 NPU
CANN: 8.0.RC1 或更高版本
PyTorch: 2.0+ with torch_npu
transformers: 4.8+
sentence-transformers (用于完整模型加载)

目录结构

cde-small-v1-ascend/
├── inference.py          # 推理测试脚本
├── test.log              # 测试日志
├── README.md             # 本文档

重要说明

cde-small-v1 模型依赖于 nomic-ai/nomic-bert-2048 作为其 embedder。由于网络限制，需要预先下载到本地。

当前实现通过手动加载本地 safetensors 权重并重映射 bert. 前缀的键来正确加载模型。

使用方式

方式一：普通推理模式

cd /opt/atomgit/mxy/cde-small-v1-ascend/

python3 inference.py --mode inference --device npu:0

方式二：精度测试模式 (CPU vs NPU)

python3 inference.py --mode precision_test --device npu:0

命令行参数说明

参数	说明	默认值
`--mode`	测试模式: inference 或 precision_test	`inference`
`--device`	运行设备: npu:0, cuda:0, cpu, auto (默认auto)	`auto`

模型结构

架构类型: DatasetTransformer (CDE)
Embedder: nomic-ai/nomic-bert-2048
Reranker: sentence-transformers/gtr-t5-base
嵌入维度: 768
最大序列长度: 512

组件	说明
first_stage_model	第一阶段嵌入模型
second_stage_model	第二阶段 Transformer 模型
output_projection	输出投影层

推理参数配置

从 config.json 提取:

{
  "architecture": "transductive",
  "embedder": "nomic-ai/nomic-bert-2048",
  "embedder_rerank": "sentence-transformers/gtr-t5-base",
  "max_seq_length": 512,
  "logit_scale": 50.0
}

已知限制

网络依赖: 模型需要下载 nomic-bert-2048 embedder，需要预先下载到本地目录
权重映射: 需要手动重映射 bert. 前缀的键以正确加载 safetensors 格式的权重

测试结果

推理测试

成功在 NPU 上运行文本嵌入提取，输出 768 维归一化嵌入向量。

测试输出示例：

Input: Hello, this is a test sentence.
Input shape: torch.Size([1, 10])
Inference time: 0.368s
Embedding shape: torch.Size([1, 768])
Embedding norm: 1.0000

Input: This is another example for embedding.
Input shape: torch.Size([1, 11])
Inference time: 0.032s
Embedding shape: torch.Size([1, 768])
Embedding norm: 1.0000

Input: CDE model for text embedding extraction
Input shape: torch.Size([1, 11])
Inference time: 0.026s
Embedding shape: torch.Size([1, 768])
Embedding norm: 1.0000

Inference Summary
Total sentences processed: 3
Total inference time: 0.426s
Average time per sentence: 0.142s

精度测试 (CPU vs NPU)

Max relative error: 0.20%
Cosine similarity: 0.9999
精度阈值: 1.0%
测试结果: PASS

测试日志

完整测试日志保存在 test.log。

参考链接

原始模型: https://huggingface.co/jxm/cde-small-v1
MTEB 基准: https://huggingface.co/spaces/mteb/leaderboard

许可证

本项目遵循 Apache-2.0 许可证