Web-SSL DINO ViT-1B on Ascend NPU

1. 简介

本文档记录 Web-SSL DINO ViT-1B（facebook/webssl-dino1b-full2b-224）在 昇腾 NPU 上的快速适配与验证结果。该模型为基于 DINOv2 自监督训练的 10 亿参数 Vision Transformer，输入分辨率为 224×224，输出图像级 CLS token 特征与 patch-wise 特征。

适配方式采用原生 transformers 加载后直接迁移至 NPU（model.to("npu")），无需修改第三方库源码，也无需 monkey-patch。

2. 验证环境

组件	版本
`torch`	`2.9.0+cpu`
`torch-npu`	`2.9.0.post1+gitee7ba04`
`transformers`	`4.57.6`
`Pillow`	`11.2.1`
`safetensors`	`0.5.3`

NPU：逻辑卡可用
CANN：8.5.1
模型路径：/opt/atomgit/weight/webssl-dino1b-full2b-224
工作目录：/opt/atomgit/webssl-dino1b-full2b-224

3. 权重下载

python3 -m atomgit download hf_mirrors/facebook/webssl-dino1b-full2b-224 \
  -d /opt/atomgit/weight/webssl-dino1b-full2b-224

4. 环境依赖安装

pip install torch torch-npu transformers Pillow safetensors

5. 推理验证

5.1 推理脚本

inference.py

5.2 运行命令

cd /opt/atomgit/webssl-dino1b-full2b-224
python3 inference.py \
  --model_path /opt/atomgit/weight/webssl-dino1b-full2b-224 \
  --image_path test_image.png \
  --device npu \
  --output_dir output

5.3 验证结果

Using device: npu
Loading processor and model...
Loading image...
Running inference...
Inference latency: 24.66 ms
CLS features shape: torch.Size([1, 1536])
Patch features shape: torch.Size([1, 256, 1536])
CLS features mean: 0.000016
CLS features std: 1.315659
Output saved to output/inference_output.pt

inference.py 执行成功
NPU 推理正常，无算子报错
输出特征维度符合预期（CLS: 1×1536，Patch: 1×256×1536）

6. 性能评测

6.1 评测脚本

benchmark.py

6.2 运行命令

python3 benchmark.py \
  --model_path /opt/atomgit/weight/webssl-dino1b-full2b-224 \
  --image_path test_image.png \
  --device npu \
  --warmup 5 \
  --iterations 20 \
  --output_dir output

6.3 性能结果

测试条件：单张 224×224 图像，warmup 5 轮，benchmark 20 轮。

指标	数值
`device`	`npu`
`warmup`	`5`
`iterations`	`20`
`average_latency`	`24.31 ms`
`throughput`	`41.14 images/s`

7. 精度评测

7.1 评测脚本

accuracy.py

7.2 运行命令

python3 accuracy.py \
  --model_path /opt/atomgit/weight/webssl-dino1b-full2b-224 \
  --image_path test_image.png \
  --output_dir output

7.3 精度结果

以 CPU 推理结果为 baseline，与 NPU 推理结果对比：

指标	CLS Token Features	Patch Features
`max_absolute_error`	`7.675886e-04`	`8.128452e-02`
`mean_absolute_error`	`1.859917e-04`	`4.807501e-04`
`mean_relative_error`	`7.538907e-04`	`3.452694e-03`
`cosine_similarity`	`0.99999988`	`1.00000632`

精度判定：

检查项	阈值	实际值	结果
CLS 相对误差	`< 1%`	`0.075%`	✅ PASS
Patch 相对误差	`< 1%`	`0.345%`	✅ PASS

结论：NPU 与 CPU 输出相对误差均小于 1%，余弦相似度接近 1.0，精度对齐通过。

8. 文件结构

.
├── inference.py              # NPU 推理脚本
├── benchmark.py              # 性能评测脚本
├── accuracy.py               # 精度验证脚本
├── test_image.png            # 测试样本（来自原仓库）
├── output/
│   ├── inference_output.pt   # 推理输出特征
│   ├── inference.log         # 推理日志
│   ├── benchmark_result.txt  # 性能结果
│   ├── benchmark.log         # 性能日志
│   ├── accuracy_result.txt   # 精度结果
│   └── accuracy.log          # 精度日志
└── README.md                 # 本文档

9. 注意事项

注意力实现：当前使用 attn_implementation="eager"，在 NPU 上运行稳定。
数据样本：test_image.png 取自原仓库（webssl_teaser.png），非随机生成，确保验证结果可复现。