nlp_structbert_sentence-similarity_chinese-tiny

1. 简介

本文档记录 iic/nlp_structbert_sentence-similarity_chinese-tiny 在昇腾 NPU（Ascend910）环境的快速部署与验证结果。

BertModel (StructBERT) 句子相似度模型，基于 ModelScope 框架，支持中文文本语义匹配。

2. 验证环境

组件	版本
`torch`	`2.5.1`
`torch_npu`	`2.5.1`
`transformers`	`>=4.48.0`
`CANN`	`8.5.RC1`

NPU：Ascend910（单卡）
隐藏层维度：256
最大序列长度：512
推理框架：PyTorch + transformers

3. 快速部署

3.1 环境准备

pip install transformers torch

3.2 推理代码

import torch
from transformers import BertModel, BertTokenizer, BertConfig
import os

device = torch.device("npu:0" if torch.npu.is_available() else "cpu")
model_path = "/path/to/model"

tokenizer = BertTokenizer.from_pretrained(model_path)

# Load model with custom state dict (strip 'encoder.' prefix)
state = torch.load(os.path.join(model_path, "pytorch_model.bin"), map_location="cpu")
new_state = {}
for k, v in state.items():
    if k.startswith("encoder."):
        new_state[k[8:]] = v
    else:
        new_state[k] = v

config = BertConfig.from_pretrained(model_path)
model = BertModel(config)
model.load_state_dict(new_state, strict=False)
model = model.to(device).eval()

sentences = ["今天天气很好", "今天天气不错", "明天会下雨"]
inputs = tokenizer(sentences, padding=True, truncation=True, max_length=512, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.pooler_output

print(f"Embedding shape: {embeddings.shape}")

4. Smoke 验证

python3 inference.py

验证结果：

模型成功加载到 npu:0
输出句向量维度正确
推理过程无报错

5. 性能参考

测试条件：FP32 / batch=8 / warmup=5 / timed=100 runs，Ascend910 单卡。

指标	数值
平均推理时间	`2.67 ms`
QPS（每秒查询）	`2993.24`
测试次数	`100`

6. 精度评测

NPU 与 CPU 输出对比，使用 8 条测试文本，比较 embedding 余弦相似度。

指标	数值
平均余弦相似度	`1.0`
最低余弦相似度	`1.0`
最大向量差异	`0.000325`
平均向量差异	`7.2e-05`
最大相对误差	`0.1468%`
结论	`PASS`

精度判定标准：NPU 与 CPU 输出的句向量最大相对误差 < 1%，平均余弦相似度 > 0.999。

7. 注意事项

本模型为 ModelScope 特定格式，权重 key 前缀为 encoder.，加载时需手动剥离前缀后加载到标准 BertModel
文本向量模型输出是浮点向量，精度验证使用余弦相似度而非分类一致率
NPU 推理结果与 CPU 的 embedding 余弦相似度达到 1.0，最大相对误差 < 1%
最大序列长度为 512，超出长度会被截断

1. 简介

本文档记录 iic/nlp_structbert_sentence-similarity_chinese-tiny 在昇腾 NPU（Ascend910）环境的快速部署与验证结果。

BertModel (StructBERT) 句子相似度模型，基于 ModelScope 框架，支持中文文本语义匹配。

相关获取地址：

组件

版本

torch

2.5.1

torch_npu

2.5.1

transformers

>=4.48.0

CANN

8.5.RC1

3. 快速部署

3.1 环境准备

pip install transformers torch

3.2 推理代码

import torch
from transformers import BertModel, BertTokenizer, BertConfig
import os

device = torch.device("npu:0" if torch.npu.is_available() else "cpu")
model_path = "/path/to/model"

tokenizer = BertTokenizer.from_pretrained(model_path)

# Load model with custom state dict (strip 'encoder.' prefix)
state = torch.load(os.path.join(model_path, "pytorch_model.bin"), map_location="cpu")
new_state = {}
for k, v in state.items():
    if k.startswith("encoder."):
        new_state[k[8:]] = v
    else:
        new_state[k] = v

config = BertConfig.from_pretrained(model_path)
model = BertModel(config)
model.load_state_dict(new_state, strict=False)
model = model.to(device).eval()

sentences = ["今天天气很好", "今天天气不错", "明天会下雨"]
inputs = tokenizer(sentences, padding=True, truncation=True, max_length=512, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.pooler_output

print(f"Embedding shape: {embeddings.shape}")

指标

数值

平均推理时间

2.67 ms

QPS（每秒查询）

2993.24

测试次数

100

6. 精度评测

NPU 与 CPU 输出对比，使用 8 条测试文本，比较 embedding 余弦相似度。

指标	数值
平均余弦相似度	`1.0`
最低余弦相似度	`1.0`
最大向量差异	`0.000325`
平均向量差异	`7.2e-05`
最大相对误差	`0.1468%`
结论	`PASS`

精度判定标准：NPU 与 CPU 输出的句向量最大相对误差 < 1%，平均余弦相似度 > 0.999。

7. 注意事项

本模型为 ModelScope 特定格式，权重 key 前缀为 encoder.，加载时需手动剥离前缀后加载到标准 BertModel

文本向量模型输出是浮点向量，精度验证使用余弦相似度而非分类一致率

NPU 推理结果与 CPU 的 embedding 余弦相似度达到 1.0，最大相对误差 < 1%

最大序列长度为 512，超出长度会被截断