KoalaAI/Text-Moderation on Ascend NPU

1. 简介

本文档记录 KoalaAI/Text-Moderation 文本内容审核模型在昇腾 NPU（Ascend 910B3）上的迁移适配、精度评测与性能验证结果。

该模型基于 DistilBERT（6 层，768 维），经过多标签文本审核训练，可同时检测多种违规类别：toxic（有毒言论）、obscene（淫秽内容）、threat（威胁）、insult（侮辱）、identity_hate（身份仇恨）等。使用 sigmoid 激活实现多标签分类（每条文本可同时触发多个违规标签），适用于社区内容审核、评论过滤等场景。

2. 验证环境

组件	版本
`torch`	`2.8.0`
`torch_npu`	`2.8.0.post4`
`transformers`	`5.8.1`
`CANN`	`8.5.1`

NPU：8 × Ascend 910B3
精度对比基准：CPU（x86, PyTorch 2.8.0）

3. 部署使用流程

3.1 环境准备

conda create -n KoalaAI_Text-Moderation python=3.11 -y
conda activate KoalaAI_Text-Moderation

pip install torch==2.8.0 torch_npu==2.8.0.post4 \
    -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install transformers numpy \
    -i https://pypi.tuna.tsinghua.edu.cn/simple

3.2 推理脚本使用

python inference.py --text "This is a toxic comment example." --device npu

编程接口：

from inference import PersonalityClassifier
clf = PersonalityClassifier(model_path="./KoalaAI_Text-Moderation", device="npu")
results, probs = clf.predict(["This is a toxic comment."])

4. Smoke 验证

python inference.py --text "This is a normal, friendly comment." --device npu

预期输出：各违规类别的概率值（sigmoid 输出），正常文本应全部低于阈值；无运行时错误。

5. 性能参考

测试条件：10 条混合质量文本，batch_size=16，NPU 预热 1 轮。

指标	数值
CPU 吞吐量	`23.1` texts/s
NPU 吞吐量	`210.2` texts/s
CPU/NPU 加速比	`9.1` ×

DistilBERT 6 层架构在 NPU 上获得 9.1× 加速，适合高吞吐实时审核管线。

6. 精度评测

6.1 评测方法

分别在 CPU 和 NPU 上对 10 条混合质量文本推理，比较多标签 sigmoid 概率向量的余弦相似度、MAE 和 Top-1 一致性。

6.2 评测结果

指标	数值
平均余弦相似度	`1.000000`
MAE	`0.000005`
最大误差	`0.000062`
精度误差率	`0.0000%`
Top-1 准确率	`100.0%`

结论：精度误差率 0.0000%，NPU 与 CPU 输出完全一致，评测通过。

7. 迁移适配说明

7.1 模型结构

Backbone：DistilBertModel（6 层，768 维，BERT 知识蒸馏版）
Classifier Head：线性层（768 → N_labels），多标签 sigmoid 激活
Tokenizer：BERT WordPiece（vocab.txt）
权重：pytorch_model.bin + safetensors 双格式
参数量：66.4M（BERT-base 110M 的 60%，推理速度约 2×）

7.2 适配要点

使用 AutoModelForSequenceClassification.from_pretrained() 加载
多标签分类（sigmoid），每条文本可同时触发多个违规标签
model.to("npu:0") 迁移，DistilBERT 6 层算子编译约 2-3 秒
双权重格式，from_pretrained 自动优先选择 safetensors

7.3 关键代码

import torch, torch_npu
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained(
    "KoalaAI/Text-Moderation"
).to("npu:0")
tokenizer = AutoTokenizer.from_pretrained("KoalaAI/Text-Moderation")

text = "This is a toxic and insulting comment."
inputs = tokenizer(text, return_tensors="pt", truncation=True)
inputs = {k: v.to("npu:0") for k, v in inputs.items()}

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.sigmoid(logits)
    flagged = {
        model.config.id2label[i]: float(p)
        for i, p in enumerate(probs[0]) if p > 0.5
    }

8. 注意事项

多标签 sigmoid：使用 sigmoid 激活（非 softmax），每条文本可同时触发 toxic + insult + threat 等多个标签。通过 threshold 控制灵敏度。
DistilBERT 轻量优势：6 层架构使推理速度比 BERT-base 快约 2×，适合高吞吐审核场景（每条消息都需审核的聊天/评论系统）。
阈值调优：默认 threshold=0.5，实际部署应根据业务需求调整。降低提高召回（宁可错杀），提高减少误报（避免误伤正常言论）。
首次 NPU 推理：DistilBERT 仅 6 层，算子编译约 2-3 秒，是所有 BERT 类模型中预热最快的之一。
标签体系：toxic / obscene / threat / insult / identity_hate 覆盖主流内容审核需求，具体标签可通过 model.config.id2label 查看完整列表。