本文档记录 facebook/webssl-dino7b-full8b-224 在昇腾 NPU 上的适配与验证结果。
webssl-dino7b-full8b-224 是 Meta 发布的 DINOv2 大型视觉 Transformer 模型(约 70 亿参数,patch size 14),基于 WebSSL 数据集预训练,输入分辨率为 224×224。该模型适用于图像特征提取、视觉表示学习等下游任务。
| 组件 | 版本 |
|---|---|
torch | 2.9.0+cpu |
torch-npu | 2.9.0.post1+gitee7ba04 |
transformers | >=4.30.0 |
Pillow | >=9.0.0 |
numpy | 1.26.4 |
Ascend910/opt/atomgit/weight/webssl-dino7b-full8b-224sample_image.jpg(COCO val2017 真实样本,已包含在仓库中)本仓库用于推理验证的样本图片为 MS COCO 2017 验证集中的 000000000139.jpg。该图片在仓库中以 sample_image.jpg 命名提交,与 COCO val2017 原图 MD5 完全一致(a0204aa65acc51cd8ffc128e5e94a05c),尺寸 640 × 426,RGB,JPEG,约 158 KB。由于 data/ 目录被 .gitignore 排除(避免提交大型数据集),因此将单张样本图以副本形式提交到根目录,保证克隆后即可直接运行。
pip install torch==2.9.0 transformers Pillow numpy注:本环境已预装
torch-npu,若在新环境部署,请参考 CANN 官方安装指南 安装对应版本的 CANN 驱动与固件。
python3 -m atomgit download hf_mirrors/facebook/webssl-dino7b-full8b-224 -d /opt/atomgit/weight/webssl-dino7b-full8b-224下载完成后,目录结构应为:
/opt/atomgit/weight/webssl-dino7b-full8b-224/
├── config.json
├── model.safetensors.index.json
├── model-00001-of-00006.safetensors
├── model-00002-of-00006.safetensors
├── model-00003-of-00006.safetensors
├── model-00004-of-00006.safetensors
├── model-00005-of-00006.safetensors
└── model-00006-of-00006.safetensorspython3 inference.pymodel.to("npu") 将模型加载到 NPU 执行推理000000000139.jpg),非随机生成[INFO] Applied NPU monkey-patch (torch.cuda -> torch.npu)
[INFO] Loading image from: /opt/atomgit/webssl-dino7b-full8b-224/sample_image.jpg
[INFO] Loading model from: /opt/atomgit/weight/webssl-dino7b-full8b-224
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model.
[INFO] Model moved to NPU device: Ascend910_9362
[INFO] Input shape: torch.Size([1, 3, 224, 224])
[INFO] Device: npu:0
[INFO] Warm-up inference...
[INFO] Running timed inference...
============================================================
Inference Results
============================================================
Device: NPU (Ascend910_9362)
Latency: 85.35 ms
Pooled output shape: torch.Size([1, 4096])
Last hidden state shape: torch.Size([1, 257, 4096])
Pooled output dtype: torch.float32
Pooled output device: npu:0
Pooled output first 5 values: [-0.37430924 0.15931404 -1.0104568 -0.57187915 0.12003752]
============================================================
[INFO] Results saved to: /opt/atomgit/webssl-dino7b-full8b-224/output/inference_result.json
[INFO] Log saved to: /opt/atomgit/webssl-dino7b-full8b-224/output/inference_log.txt推理产物保存在 output/ 目录下:
inference_result.json:结构化推理结果(含输出张量形状、样本值、延迟等)inference_log.txt:推理日志文本python3 benchmark.py[INFO] Applied NPU monkey-patch (torch.cuda -> torch.npu)
[INFO] Loading image from: /opt/atomgit/webssl-dino7b-full8b-224/sample_image.jpg
[INFO] Loading model from: /opt/atomgit/weight/webssl-dino7b-full8b-224
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model.
[INFO] Model moved to NPU: Ascend910_9362
[INFO] Benchmarking latency (5 warmup + 20 timed)...
============================================================
Latency Benchmark Results
============================================================
Device: NPU (Ascend910_9362)
Iterations: 20
Mean: 85.27 ms
StdDev: 0.03 ms
Min: 85.21 ms
Max: 85.33 ms
P50: 85.27 ms
P90: 85.32 ms
P99: 85.33 ms
============================================================
[INFO] Benchmarking throughput for batch sizes: [1, 2, 4]
batch=1: 11.73 images/sec, avg_latency=85.26 ms
batch=2: 14.23 images/sec, avg_latency=140.58 ms
batch=4: 14.57 images/sec, avg_latency=274.54 ms
============================================================
Throughput Benchmark Results
============================================================
Batch= 1: 11.73 img/s | avg_latency= 85.26 ms
Batch= 2: 14.23 img/s | avg_latency= 140.58 ms
Batch= 4: 14.57 img/s | avg_latency= 274.54 ms
============================================================
[INFO] Results saved to /opt/atomgit/webssl-dino7b-full8b-224/output性能产物保存在 output/ 目录下:
benchmark_result.json:结构化性能结果(延迟分布、吞吐数据)benchmark_log.txt:性能日志文本| 指标 | 数值 |
|---|---|
| 平均延迟 | 85.27 ms |
| 50%分位延迟 | 85.27 ms |
| 90%分位延迟 | 85.32 ms |
| 99%分位延迟 | 85.33 ms |
| 吞吐量(批大小=1) | 11.73 img/s |
| 吞吐量(批大小=2) | 14.23 img/s |
| 吞吐量(批大小=4) | 14.57 img/s |
python3 accuracy.py< 1%,余弦相似度 > 0.999[INFO] Applied NPU monkey-patch (torch.cuda -> torch.npu)
[INFO] Loading image from: /opt/atomgit/webssl-dino7b-full8b-224/sample_image.jpg
[INFO] Loading model from: /opt/atomgit/weight/webssl-dino7b-full8b-224
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model.
[INFO] Running CPU baseline inference...
[INFO] CPU baseline done.
[INFO] Running NPU inference...
[INFO] NPU inference done.
============================================================
Accuracy Validation Results
============================================================
pooler_output:
Shape: (1, 4096)
Vector Relative Error: 0.000302 (PASS)
Cosine Similarity: 1.000000 (PASS)
MSE: 0.0000000393
Max Absolute Diff: 0.000776
Overall: PASS
last_hidden_state:
Shape: (1, 257, 4096)
Vector Relative Error: 0.002019 (PASS)
Cosine Similarity: 0.999997 (PASS)
MSE: 0.0000020290
Max Absolute Diff: 0.068947
Overall: PASS
============================================================
OVERALL: PASS (vector-level relative error < 1% and cosine similarity > 0.999)
============================================================
[INFO] Results saved to /opt/atomgit/webssl-dino7b-full8b-224/output精度验证产物保存在 output/ 目录下:
accuracy_result.json:结构化精度对比结果accuracy_log.txt:精度日志文本| 输出项 | 向量相对误差 | 余弦相似度 | MSE | 结论 |
|---|---|---|---|---|
pooler_output | 0.000302 | 1.000000 | 0.0000000393 | PASS |
last_hidden_state | 0.002019 | 0.999997 | 0.0000020290 | PASS |
torch.cuda -> torch.npu)实现 NPU 兼容,未修改 transformers 原始库代码,升级 transformers 版本时通常无需重新适配。torch.compile,脚本已通过环境变量 TORCH_COMPILE_DISABLE=1 显式禁用,避免潜在兼容性问题。[LOG_WARNING] can not create directory, directory: /home/atomgit/ascend/log,属于 Ascend 驱动日志目录未创建的提示,不影响推理结果,可忽略。/opt/atomgit/weight/webssl-dino7b-full8b-224,请根据实际下载路径修改 inference.py、benchmark.py、accuracy.py 中的 model_path 变量。