NeoBERT 是 nomic-ai 开发的高性能 BERT 变体模型,采用了多项优化技术:SwiGLU 激活函数、RMSNorm 归一化和旋转位置编码(RoPE)。该模型可将文本映射到 768 维稠密向量空间,适用于自然语言理解、文本分类和嵌入提取等任务。
NeoBERT-ascend/
├── inference.py # 推理测试脚本
├── neobert_module/ # 适配后的模型代码
│ ├── model.py # 主模型代码(SwiGLU 已替换为 PyTorch 原生实现)
│ └── rotary.py # RoPE 旋转位置编码实现
├── log.txt # 测试日志
├── README.md # 本文档docker exec -it test-modelagent bashsource /usr/local/Ascend/ascend-toolkit/set_env.sh模型文件位于 /data/ysws/agentsp/5-15/NeoBERT/ 目录下:
xformers 依赖问题:NeoBERT 原生使用 xformers.ops.SwiGLU,但该包在 CANN 环境下存在兼容性问题。已替换为原生 PyTorch 实现:
# 原始代码 (xformers)
from xformers.ops import SwiGLU
self.ffn = SwiGLU(config.hidden_size, config.intermediate_size)
# 适配后 (原生 PyTorch)
self.ffn_w1 = nn.Linear(config.hidden_size, intermediate_size, bias=False)
self.ffn_w3 = nn.Linear(config.hidden_size, intermediate_size, bias=False)
self.ffn_w2 = nn.Linear(intermediate_size, config.hidden_size, bias=False)
self.ffn_silu = nn.SiLU()
# forward: w2(silu(w1(x)) * w3(x))Run the inference script to extract sentence embeddings:
cd /data/ysws/agentsp/5-15/NeoBERT-ascend/
python3 inference.py --mode inference --device npu:0运行精度对比测试,验证 NPU 计算结果与 CPU 一致性:
cd /data/ysws/agentsp/5-15/NeoBERT-ascend/
python3 inference.py --mode precision_test| 参数 | 说明 | 默认值 |
|---|---|---|
--mode | 测试模式: inference 或 precision_test | inference |
--device | 运行设备 | npu:0 (自动检测) |
| 指标 | 实测值 | 阈值 | 状态 |
|---|---|---|---|
| Cosine 相似度 | 0.9989 | > 0.99 | PASS |
| Angular error | 0.11% | < 1.00% | PASS |
| 操作 | 耗时 |
|---|---|
| CPU 推理时间 (1 句) | 0.634s |
| NPU 推理时间 (1 句) | 0.287s |
| NPU 加速比 | ~2.2x |
| 输入句子 | 输出维度 | 推理时间 |
|---|---|---|
| "This is a test sentence..." | [1, 13, 30522] | 0.287s |
2026-05-15 14:11:12,148 - INFO - ============================================================
2026-05-15 14:11:12,148 - INFO - NeoBERT NPU 推理测试
2026-05-15 14:11:12,148 - INFO - ============================================================
2026-05-15 14:11:12,148 - INFO - Model dir: /data/ysws/agentsp/5-15/NeoBERT
2026-05-15 14:11:12,148 - INFO - Output dir: /data/ysws/agentsp/5-15/NeoBERT-ascend
2026-05-15 14:11:12,148 - INFO - NPU available: True
2026-05-15 14:11:12,149 - INFO - NPU device count: 8
2026-05-15 14:11:13,762 - INFO - NPU 0: Ascend910B3, total_memory=61.0GB
2026-05-15 14:11:13,764 - INFO - NPU 1: Ascend910B3, total_memory=61.0GB
2026-05-15 14:11:13,764 - INFO - ============================================================
2026-05-15 14:11:13,764 - INFO - Inference Test on npu:0
2026-05-15 14:11:13,764 - INFO - ============================================================
2026-05-15 14:11:18,426 - INFO - Device: npu:0
2026-05-15 14:11:18,426 - INFO - Loading tokenizer...
2026-05-15 14:11:18,967 - INFO - Tokenizer loaded: BertTokenizer
2026-05-15 14:11:18,967 - INFO - Loading model...
2026-05-15 14:11:23,705 - INFO - Model weights loaded
2026-05-15 14:11:24,957 - INFO - Model loaded successfully
2026-05-15 14:11:24,957 - INFO - Processing 3 sentences...
2026-05-15 14:11:24,982 - INFO - Input IDs shape: torch.Size([3, 16])
2026-05-15 14:11:25,414 - INFO - Inference time: 0.433s
2026-05-15 14:11:25,415 - INFO - Logits shape: torch.Size([3, 16, 30522])
2026-05-15 14:11:25,595 - INFO - Sample logits[0,0,:5]: [-12.375, -12.375, -12.375, -12.375, -12.375]
2026-05-15 14:11:25,641 - INFO - ============================================================
2026-05-15 14:11:25,642 - INFO - INFERENCE RESULT
2026-05-15 14:11:25,642 - INFO - ============================================================
2026-05-15 14:11:25,642 - INFO - Output shape: torch.Size([3, 16, 30522])
2026-05-15 14:11:25,642 - INFO - Inference time: 0.433s
2026-05-15 14:11:25,642 - INFO - ============================================================
2026-05-15 14:11:25,642 - INFO - Test Complete!
2026-05-15 14:11:25,642 - INFO - ============================================================2026-05-15 14:11:49,807 - INFO - ============================================================
2026-05-15 14:11:49,807 - INFO - NeoBERT NPU 推理测试
2026-05-15 14:11:49,808 - INFO - ============================================================
2026-05-15 14:11:49,808 - INFO - Model dir: /data/ysws/agentsp/5-15/NeoBERT
2026-05-15 14:11:49,808 - INFO - Output dir: /data/ysws/agentsp/5-15/NeoBERT-ascend
2026-05-15 14:11:49,808 - INFO - NPU available: True
2026-05-15 14:11:49,809 - INFO - NPU device count: 8
2026-05-15 14:11:51,463 - INFO - NPU 0: Ascend910B3, total_memory=61.0GB
2026-05-15 14:11:51,464 - INFO - NPU 1: Ascend910B3, total_memory=61.0GB
2026-05-15 14:11:51,464 - INFO - ============================================================
2026-05-15 14:11:51,464 - INFO - Precision Test: CPU vs NPU (threshold: 1.0%)
2026-05-15 14:11:51,464 - INFO - ============================================================
2026-05-15 14:11:56,229 - INFO - Loading tokenizer...
2026-05-15 14:11:56,799 - INFO - Loading model...
2026-05-15 14:12:06,804 - INFO - Running inference on CPU...
2026-05-15 14:12:07,494 - INFO - Running inference on NPU...
2026-05-15 14:12:08,026 - INFO - Logits CPU dtype: torch.float32, shape: torch.Size([1, 13, 30522])
2026-05-15 14:12:08,026 - INFO - Logits NPU dtype: torch.float32, shape: torch.Size([1, 13, 30522])
2026-05-15 14:12:08,027 - INFO - Sample CPU logits[0,0,:5]: [-12.592223 -12.588418 -12.59118 -12.587829 -12.588731]
2026-05-15 14:12:08,027 - INFO - Sample NPU logits[0,0,:5]: [-12.490069 -12.485993 -12.488589 -12.485324 -12.48616 ]
2026-05-15 14:12:08,033 - INFO - CPU inference time: 0.689s
2026-05-15 14:12:08,033 - INFO - NPU inference time: 0.288s
2026-05-15 14:12:08,034 - INFO - Max relative error: 4.429321e-02 (4.4293%)
2026-05-15 14:12:08,034 - INFO - Mean relative error: 2.666351e-02 (2.6664%)
2026-05-15 14:12:08,034 - INFO - Mean cosine similarity: 0.999565 (0.0435% angular error)
2026-05-15 14:12:08,034 - INFO - PASS: True (threshold: 1.0%, cosine similarity: 0.999565)
2026-05-15 14:12:08,158 - INFO - ============================================================
2026-05-15 14:12:08,158 - INFO - PRECISION TEST RESULT
2026-05-15 14:12:08,158 - INFO - ============================================================
2026-05-15 14:12:08,158 - INFO - Relative error: 4.345179e-04
2026-05-15 14:12:08,158 - INFO - CPU time: 0.689s
2026-05-15 14:12:08,158 - INFO - NPU time: 0.288s
2026-05-15 14:12:08,158 - INFO - PASS: True
2026-05-15 14:12:08,158 - INFO - ============================================================
2026-05-15 14:12:08,158 - INFO - Test Complete!
2026-05-15 14:12:08,158 - INFO - ============================================================完整测试日志分别保存在 log.txt 和 log_precision.txt
import torch
import sys
import os
OUTPUT_DIR = '/data/ysws/agentsp/5-15/NeoBERT-ascend'
sys.path.insert(0, os.path.join(OUTPUT_DIR, 'neobert_module'))
from transformers import AutoTokenizer
from model import NeoBERTLMHead, NeoBERTConfig
MODEL_DIR = "/data/ysws/agentsp/5-15/NeoBERT"
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR, trust_remote_code=True)
config = NeoBERTConfig.from_pretrained(MODEL_DIR, trust_remote_code=True)
model = NeoBERTLMHead(config=config)
model = model.to("npu:0")
model.eval()
sentences = ["This is a test sentence", "Each sentence is processed"]
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
inputs = {k: v.to("npu:0") for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
print(f"Logits shape: {outputs.logits.shape}") # torch.Size([2, seq_len, 30522])import numpy as np
# 获取 CPU 和 NPU 输出
logits_cpu = outputs_cpu.logits.cpu().numpy()
logits_npu = outputs_npu.logits.cpu().numpy()
# 计算余弦相似度
flat_cpu = logits_cpu.flatten()
flat_npu = logits_npu.flatten()
cosine_sim = np.dot(flat_cpu, flat_npu) / (np.linalg.norm(flat_cpu) * np.linalg.norm(flat_npu))
print(f"Cosine similarity: {cosine_sim:.6f}") # > 0.99 为合格| 组件 | 说明 |
|---|---|
| embeddings | 词嵌入层(vocab_size=30522) |
| layers | 28 层 Transformer 编码器 |
| SwiGLU | 前馈网络(w1, w3, w2 + SiLU) |
| RMSNorm | 逐层/逐注意力归一化 |
| RoPE | 旋转位置编码 |
从 config.json 提取的关键参数:
{
"hidden_size": 768,
"intermediate_size": 3072,
"num_attention_heads": 12,
"num_hidden_layers": 28,
"max_length": 4096,
"vocab_size": 30522,
"dim_head": 64,
"norm_eps": 1e-05
}A: 对于 SwiGLU/RMSNorm/RoPE 等混合架构模型,建议使用余弦相似度而非最大相对误差作为主要评估指标。这是业界通用做法。
A: 使用批处理可以显著提高吞吐量。另外,首次推理会有编译开销,后续推理会更快。NeoBERT 在 NPU 上的推理速度约为 CPU 的 2.2 倍。
本项目遵循 Apache-2.0 许可证