ModernBERT-base on Ascend NPU

1. 简介

本文档记录 ModernBERT-base 在昇腾 NPU（Ascend 910B3）环境的适配部署与精度验证结果。

ModernBERT-base 是一种现代化 BERT 模型（参数量约 149M），改进了架构效率与长序列处理能力（最长 2048 token）。本项目完成该模型在昇腾 NPU 上的推理适配，验证 NPU 与 CPU 推理结果的精度误差 < 1%。

2. 验证环境

组件	版本
`python`	`3.11.x`
`torch`	`2.10.0+cpu`
`torch_npu`	`2.10.0`
`transformers`	`5.8.1`
`safetensors`	`0.7.0`
`CANN`	`8.5.1`

NPU：Ascend 910B3（8卡）
模型路径：/path/to/model
框架：PyTorch + transformers

3. 模型信息

项目	值
模型架构	ModernBERT (ModernBertModel)
参数量	~149M
隐藏维度	768
Layers	22
最大序列长度	2048
权重格式	safetensors
预训练数据	Web corpus
许可证	Apache-2.0

4. Conda 环境安装

conda create -n modernbert python=3.11 -y
conda activate modernbert
pip install torch==2.10.0 --index-url https://repo.huaweicloud.com/repository/pypi/simple/
pip install torch_npu==2.10.0 --index-url https://repo.huaweicloud.com/repository/pypi/simple/
pip install transformers safetensors --index-url https://repo.huaweicloud.com/repository/pypi/simple/

如果 HuggingFace 网络不通，设置镜像：

export HF_ENDPOINT=https://hf-mirror.com/

5. 推理执行

文本嵌入提取

# NPU 推理（默认）
python3 inference.py \
    --model_path /path/to/ModernBERT-base \
    --text "Your text here"

# CPU 推理
python3 inference.py \
    --model_path /path/to/ModernBERT-base \
    --text "Your text here" \
    --device cpu

精度与性能评测

python3 benchmark.py \
    --model_path /path/to/ModernBERT-base \
    --npu_device npu:0

评测结果日志将输出到 log.txt。

6. 参数说明

inference.py 参数

参数	说明	默认值
`--model_path`	模型权重路径	必需
`--text`	输入文本	`Hello`
`--device`	运行设备	`npu:0`

benchmark.py 参数

参数	说明	默认值
`--model_path`	模型权重路径	必需
`--npu_device`	NPU 设备 ID	`npu:0`
`--num_warmup`	NPU 预热轮数	`3`

7. 精度评测

使用同一段输入文本分别在 CPU（FP32）和 NPU（FP32）上运行推理，对比输出 CLS token 嵌入向量差异。

指标	数值
向量级相对误差	`0.136400%`
余弦相似度	`0.9999988079`
判定阈值	`< 1%`

评价指标	实测值	阈值	状态
向量级相对误差	`0.14%`	< 1%	PASS

8. 性能数据

操作	耗时
CPU 推理时间（FP32）	`0.30 s`
NPU 推理时间（FP32，3轮预热后）	`0.03 s`
加速比 (CPU / NPU)	`9.48 x`

9. 注意事项

HuggingFace 网络问题：如果 HuggingFace 无法访问，需设置 export HF_ENDPOINT=https://hf-mirror.com/。
NPU 预热：NPU 首次推理包含编译优化，通常需要 1-2 轮预热才能达到稳定性能。脚本默认开启 3 轮预热。
权重文件：模型权重（.safetensors）不包含在适配仓库中，需从 HuggingFace 单独下载。
ModernBERT：支持最长 2048 token 序列，22 层 Transformer。