iic/nlp_structbert_sentiment-classification_chinese-base on Ascend NPU

1. 简介

本文档记录 iic/nlp_structbert_sentiment-classification_chinese-base 在华为昇腾 Ascend NPU 上的适配与验证结果。

StructBERT 中文情感分类模型，基于 Structbert-base-chinese 在 bdci、dianping、jd binary、waimai-10k 四个数据集（11.5w 条数据）上 fine-tune 得到。输入自然语言文本，模型会给出该文本的情感分类标签（0：负面，1：正面）以及相应的概率。

模型类型：StructBERT (text-classification)
参数量：约 102M
框架：PyTorch + transformers + torch_npu
支持语种：中文、英文

模型获取地址：

ModelScope：https://modelscope.cn/models/iic/nlp_structbert_sentiment-classification_chinese-base

2. 验证环境

组件	版本
NPU	Ascend 910 (2卡)
NPU 驱动	25.5.2
PyTorch	2.9.0
torch_npu	已集成
ModelScope	1.35.3

3. 服务启动

本模型为 BERT 架构的情感分类模型，不支持 vLLM 服务化部署，直接通过 Python 脚本调用 ModelScope 进行推理：

cd /opt/atomgit/iic/nlp_structbert_sentiment-classification_chinese-base
python3 inference.py

推理示例：

from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

semantic_cls = pipeline(Tasks.text_classification, 'damo/nlp_structbert_sentiment-classification_chinese-base')
result = semantic_cls(input='启动的时候很大声音，然后就会听到1.2秒的卡察的声音，类似齿轮摩擦的声音')
print(result)

4. Smoke 验证

cd /opt/atomgit/iic/nlp_structbert_sentiment-classification_chinese-base
python3 inference.py

预期输出：

{'scores': [0.9279, 0.0721], 'labels': ['负面', '正面']}

5. 性能参考

测试条件：Ascend 910 单卡，batch_size=1。

指标	数值
平均延迟	8.10 ms
P50 延迟	7.95 ms
P90 延迟	8.05 ms
P99 延迟	8.17 ms
吞吐量	123.53 samples/sec

6. 精度评测

在 20 条测试数据上（10 条正面、10 条负面）的精度结果：

指标	数值
测试总数	20
正确数	20
准确率	100.00%

所有测试样本均正确分类，模型在 NPU 上的推理精度与原始模型一致。

运行精度测试：

cd /opt/atomgit/iic/nlp_structbert_sentiment-classification_chinese-base
python3 eval/accuracy_test.py

运行性能测试：

cd /opt/atomgit/iic/nlp_structbert_sentiment-classification_chinese-base
python3 eval/perf_test.py

7. 注意事项

本模型为 StructBERT（Encoder-only）架构，不支持 vLLM 等 decoder-only 框架的服务化部署。
使用 ModelScope pipeline 时，device 参数仅支持 "cpu"、"cuda"、"gpu" 格式，需通过 model.to("npu:0") 手动将模型移入 NPU。
建议关闭 TORCH_NPU_LOGGING 环境变量以减少日志输出。
首次推理包含模型加载时间（约 2-3 秒），后续推理为纯计算时间。
模型权重文件约 390MB，推理所需显存约 2GB。

Ascend NPU 精度评测

NPU vs CPU 精度对比（CPU 为基线，NPU 为验证目标）：

指标	数值
测试用例数	6
最大 logits 差异	0.00113034
预测一致性	6/6 (100%)
精度要求	NPU vs CPU 最大 logits 误差 < 1%
精度结论	通过 (差异小于 1%)

精度评测源代码和日志详见 eval/ 目录。