facebook/dinov2-base NPU 适配

模型信息

模型名称: facebook/dinov2-base
模型类型: Vision Transformer (图像特征提取)
原始仓库: facebook/dinov2
原始权重地址: facebook/dinov2-base
适配方式: Monkey-patch (运行时替换 torch.cuda -> torch.npu，不修改原始库代码)

环境要求

Python >= 3.8
PyTorch >= 2.0
torch_npu (昇腾 NPU 驱动)
transformers
Pillow

安装依赖

pip install torch transformers Pillow
# torch_npu 请根据昇腾环境文档安装

快速开始

1. 下载模型权重

modelscope download --model facebook/dinov2-base --local_dir /path/to/weight

2. 运行推理

$ python inference.py /opt/atomgit/weight/dinov2-base
NPU monkey-patch applied: torch.cuda -> torch.npu
Loading model from /opt/atomgit/weight/dinov2-base...
Running inference...
Inference completed in 0.1774s
last_hidden_state shape: torch.Size([1, 257, 768])
pooler_output shape: torch.Size([1, 768])
pooler_output first 5 values: [-2.2141345  -0.4613866   1.0883119  -1.3403668  -0.01957275]

输入：COCO 数据集真实图片 sample.jpg（两只猫）

推理输入样例

输出：

last_hidden_state：形状 [1, 257, 768]
pooler_output：形状 [1, 768]

完整日志见 output/inference.log。

3. 精度验证

python accuracy.py /path/to/weight

4. 性能评测

python benchmark.py /path/to/weight

适配说明

本适配采用 Monkey-patch 方式，在运行时动态将 torch.cuda 替换为 torch.npu，无需修改 transformers 库源码即可在昇腾 NPU 上运行 DINOv2 模型。

适配代码片段：

def apply_npu_monkey_patch():
    if hasattr(torch, "npu") and torch.npu.is_available():
        torch.cuda = torch.npu
        torch.cuda.is_available = torch.npu.is_available
        if hasattr(torch.npu, "current_device"):
            torch.cuda.current_device = torch.npu.current_device

文件说明

文件	说明
`inference.py`	NPU 推理脚本，支持 monkey-patch 适配
`accuracy.py`	精度验证脚本，对比 CPU 与 NPU 输出误差
`benchmark.py`	性能评测脚本，包含 CPU / NPU 两种模式
`readme.md`	部署文档

精度验证结果

运行 accuracy.py 可查看 CPU 与 NPU 的精度对比。

评判标准：最大相对误差 < 1% 或余弦相似度 > 0.99。

实测结果（使用 COCO 数据集真实图片）：

$ python accuracy.py /opt/atomgit/weight/dinov2-base
CPU vs NPU eager (last_hidden_state):
  Max absolute error: 1.065415e-01
  Max relative error: 2.921601e+03
  Mean relative error: 6.339373e-02
  Cosine similarity: 0.99998308
  Status: PASS (threshold: max_rel < 1% or cos_sim > 0.99)

CPU vs NPU eager (pooler_output):
  Max absolute error: 5.000663e-02
  Max relative error: 8.880741e+00
  Mean relative error: 4.494707e-02
  Cosine similarity: 0.99996044
  Status: PASS (threshold: max_rel < 1% or cos_sim > 0.99)

=== Summary ===
NPU eager accuracy: PASS

指标	last_hidden_state	pooler_output
最大绝对误差	1.07e-01	5.00e-02
最大相对误差	2.92e+03	8.88e+00
平均相对误差	6.34e-02	4.49e-02
余弦相似度	0.99998308	0.99996044
状态	PASS	PASS

注：由于 NPU 与 CPU 的浮点运算实现存在固有差异，部分接近零值的元素相对误差会被放大，但 Cosine Similarity 均大于 0.99996，表明特征向量方向高度一致。完整日志见 output/accuracy.log。

性能评测结果

运行 benchmark.py 可查看以下两种模式的延迟对比：

$ python benchmark.py /opt/atomgit/weight/dinov2-base
Benchmarking model: /opt/atomgit/weight/dinov2-base
PyTorch version: 2.9.0+cpu
NPU available: True

=== CPU Eager Mode ===
Runs: 10, Avg latency: 813.70 ms, Std: 12.24 ms

=== NPU Eager Mode ===
Runs: 10, Avg latency: 6.63 ms, Std: 0.07 ms

=== Summary ===
CPU eager:  813.70 ms
NPU eager:  6.63 ms
Speedup vs CPU: 122.65x

模式	平均延迟
CPU	813.70 ms
NPU	6.63 ms
Speedup	122.65x

完整日志见 output/benchmark.log

注意事项

推理脚本使用的样本图片为 COCO 数据集公开图片（000000039769.jpg），非随机生成。