xujiashuai/timm-nlp_gte_sentence-embedding_chinese-base

nlp_gte_sentence-embedding_chinese-base on Ascend NPU

1. 简介

本文档记录 iic/nlp_gte_sentence-embedding_chinese-base 在昇腾 NPU 上的适配验证结果。

模型来源: iic/nlp_gte_sentence-embedding_chinese-base
参数量: 102,267,648 (~102M)
适配状态: SUCCESS
适配时间: 2026-05-17

2. 验证环境

组件	版本
`torch`	`2.9.0`
`torch-npu`	`2.9.0.post1`
`transformers`	`4.57.6`
`CANN`	`8.5.1`

NPU: Ascend 910B4 (2 cards, 61.27 GB each)
系统: Linux aarch64

3. 推理脚本

python inference.py --model-id iic/nlp_gte_sentence-embedding_chinese-base --device npu:0

或使用 evaluate.py 进行完整验证：

python evaluate.py --model-id iic/nlp_gte_sentence-embedding_chinese-base --device npu:0 --output report.json

4. 推理输出证据

运行 inference.py 的实际输出：

$ python3 inference.py --model-id iic/nlp_gte_sentence-embedding_chinese-base --device npu:0

[1/5] 加载模型: iic/nlp_gte_sentence-embedding_chinese-base
  参数量: 102,267,648
[2/5] 迁移到 npu:0
[3/5] Tokenize 输入 (max_length=128)
[4/5] 运行推理验证
  输出形状: [1, 128, 768]
  是否有 NaN: False
[5/5] 性能基准测试 (10 轮)
  平均延迟: 8.35 ms
  峰值显存: 0.21 GB
[额外] CPU vs NPU 精度对比
  Cosine Similarity: 0.999991
  Max Abs Error: 0.012462
  精度误差: 0.0009%
  ✅ 精度满足要求（< 1%）

✓ 验证通过

Smoke 验证汇总

指标	结果
输出形状	`[1, 128, 768]`
是否有 NaN	否 ✅
推理状态	正常 ✅

5. 性能参考

指标	数值
平均延迟	8.35 ms
峰值显存	0.21 GB
测试轮数	10

6. 精度评测

✅ NPU vs CPU 精度对比

指标	数值
Cosine Similarity	0.999991
Max Abs Error	0.012462
精度误差	0.0009%
是否满足要求	是（< 1%）✅

7. 评测材料

材料	文件	说明
推理脚本	`inference.py`	独立可运行的 NPU 推理代码
精度评测代码	`evaluate.py`	CPU vs NPU cosine similarity 对比
环境检查	`env_check.py`	NPU 环境验证脚本
运行日志	`logs/*.log`	完整执行日志（可复现）
自验证截图	`screenshots/`	终端验证截图
精度报告	`report.json`	结构化评测数据
部署文档	`DEPLOY.md`	环境搭建与验证指南
依赖清单	`requirements.txt`	Python 依赖（uv/pip 安装）

8. Agent Skill

本模型适配由以下 Agent Skill 完成（6.2 必填）

项目	内容
Skill 名称	`text-encoder-npu-adapt`
触发条件	BERT/GTE 文本编码器适配到昇腾 NPU
覆盖模型	NLP Sentence Encoder 模型
核心能力	文本编码、Mean Pooling、FP16 推理、精度验证、性能基准

使用方法

Agent 自动执行：

# 下载模型
python -c "from modelscope.hub.snapshot_download import snapshot_download; snapshot_download('iic/nlp_gte_sentence-embedding_chinese-base', cache_dir='./models')"

# 运行验证
python wave1/1h_nlp_encoder/evaluate_nlp.py \
  --model-id iic/nlp_gte_sentence-embedding_chinese-base \
  --device npu:0 --dtype float16 --max-length 128 \
  --cache-dir ./models/iic/nlp_gte_sentence-embedding_chinese-base \
  --output report.json

手动复现步骤

# Step 1: 环境检查
python3 env_check.py

# Step 2: 验证模型
python3 evaluate.py --model-id iic/nlp_gte_sentence-embedding_chinese-base --device npu:0 --output report.json

# Step 3: 运行推理
python3 inference.py --model-id iic/nlp_gte_sentence-embedding_chinese-base --device npu:0

9. 注意事项

首次运行需从 ModelScope 下载模型权重（HuggingFace 国内不可达）
模型使用 Mean Pooling 提取 sentence embedding
CPU vs NPU 对比使用 Mean Pooling 后的 embedding 计算 cosine similarity

贡献者: xujiashuai 参赛赛道: 模型适配赛道 提交时间: 2026-05-17