本文档记录 timm/vit_base_patch16_224.dino 模型在昇腾 NPU 上的适配与验证结果。
vit_base_patch16_224.dino 是 Meta 公司发布的 DINO 自监督视觉 Transformer 模型(ViT-Base,补丁大小 16),基于 ImageNet 数据集进行预训练,输入分辨率为 224×224。该模型适用于图像特征提取、视觉表示学习等下游任务。
| 组件 | 版本 |
|---|---|
torch | 2.9.0+cpu |
torch-npu | 2.9.0.post1+gitee7ba04 |
timm | 1.0.27 |
Pillow | >=9.0.0 |
numpy | 1.26.4 |
Ascend910/opt/atomgit/weight/vit_base_patch16_224.dinosample_image.jpg(COCO val2017 真实样本,已包含在仓库中)本仓库用于推理验证的样本图片为 MS COCO 2017 验证集中的 000000000139.jpg。该图片在仓库中以 sample_image.jpg 命名提交,与 COCO val2017 原图 MD5 完全一致(a0204aa65acc51cd8ffc128e5e94a05c),尺寸为 640 × 426,RGB 模式,JPEG 格式,大小约 158 KB。由于 data/ 目录被 .gitignore 排除(避免提交大型数据集),因此将单张样本图以副本形式提交到根目录,确保克隆后即可直接运行。
pip install torch==2.9.0 timm==1.0.27 Pillow numpy注:本环境已预装
torch-npu,若在新环境部署,请参考 CANN 官方安装指南 安装对应版本的 CANN 驱动与固件。
python3 -m atomgit download hf_mirrors/timm/vit_base_patch16_224.dino -d /opt/atomgit/weight/vit_base_patch16_224.dino下载完成后,目录结构应为:
/opt/atomgit/weight/vit_base_patch16_224.dino/
├── config.json
├── model.safetensors
└── pytorch_model.binpython3 inference.pymodel.to("npu") 将模型加载到 NPU 执行推理000000000139.jpg),非随机生成[INFO] Applied NPU monkey-patch (torch.cuda -> torch.npu)
[INFO] Loading image from: /opt/atomgit/vit_base_patch16_224.dino/sample_image.jpg
[INFO] Loading model: vit_base_patch16_224.dino
[INFO] Model moved to NPU device: Ascend910_9362
[INFO] Input shape: torch.Size([1, 3, 224, 224])
[INFO] Device: npu:0
[INFO] Warm-up inference...
[INFO] Running timed inference...
============================================================
Inference Results
============================================================
Device: NPU (Ascend910_9362)
Latency: 9.92 ms
Pooled output shape: torch.Size([1, 768])
Features shape: torch.Size([1, 197, 768])
Pooled output dtype: torch.float32
Pooled output device: npu:0
Pooled output first 5 values: [ 2.0662074 1.5527148 0.43426713 0.93954235 -0.8202585 ]
============================================================
[INFO] Results saved to: /opt/atomgit/vit_base_patch16_224.dino/output/inference_result.json
[INFO] Log saved to: /opt/atomgit/vit_base_patch16_224.dino/output/inference_log.txt推理产物保存在 output/ 目录下:
inference_result.json:结构化推理结果(含输出张量形状、样本值、延迟等)inference_log.txt:推理日志文本python3 benchmark.py[INFO] Applied NPU monkey-patch (torch.cuda -> torch.npu)
[INFO] Loading image from: /opt/atomgit/vit_base_patch16_224.dino/sample_image.jpg
[INFO] Loading model: vit_base_patch16_224.dino
[INFO] Model moved to NPU: Ascend910_9362
[INFO] Benchmarking latency (5 warmup + 20 timed)...
============================================================
Latency Benchmark Results
============================================================
Device: NPU (Ascend910_9362)
Iterations: 20
Mean: 4.82 ms
StdDev: 0.06 ms
Min: 4.58 ms
Max: 4.89 ms
P50: 4.83 ms
P90: 4.86 ms
P99: 4.89 ms
============================================================
[INFO] Benchmarking throughput for batch sizes: [1, 2, 4, 8]
batch=1: 217.99 images/sec, avg_latency=4.59 ms
batch=2: 438.91 images/sec, avg_latency=4.56 ms
batch=4: 761.95 images/sec, avg_latency=5.25 ms
batch=8: 1010.94 images/sec, avg_latency=7.91 ms
============================================================
Throughput Benchmark Results
============================================================
Batch= 1: 217.99 img/s | avg_latency= 4.59 ms
Batch= 2: 438.91 img/s | avg_latency= 4.56 ms
Batch= 4: 761.95 img/s | avg_latency= 5.25 ms
Batch= 8: 1010.94 img/s | avg_latency= 7.91 ms
============================================================
[INFO] Results saved to /opt/atomgit/vit_base_patch16_224.dino/output性能产物保存在 output/ 目录下:
benchmark_result.json:结构化性能结果(延迟分布、吞吐数据)benchmark_log.txt:性能日志文本| 指标 | 数值 |
|---|---|
| 平均延迟 | 4.82 ms |
| 延迟 P50 | 4.83 ms |
| 延迟 P90 | 4.86 ms |
| 延迟 P99 | 4.89 ms |
| 吞吐量(bs=1) | 217.99 img/s |
| 吞吐量(bs=2) | 438.91 img/s |
| 吞吐量(bs=4) | 761.95 img/s |
| 吞吐量(bs=8) | 1010.94 img/s |
python3 accuracy.py< 1%,余弦相似度 > 0.999[INFO] Applied NPU monkey-patch (torch.cuda -> torch.npu)
[INFO] Loading image from: /opt/atomgit/vit_base_patch16_224.dino/sample_image.jpg
[INFO] Loading model: vit_base_patch16_224.dino
[INFO] Running CPU baseline inference...
[INFO] CPU baseline done.
[INFO] Running NPU inference...
[INFO] NPU inference done.
============================================================
Accuracy Validation Results
============================================================
pooled_output:
Shape: (1, 768)
Vector Relative Error: 0.003396 (PASS)
Cosine Similarity: 0.999994 (PASS)
MSE: 0.0000644438
Max Absolute Diff: 0.054185
Overall: PASS
features:
Shape: (1, 197, 768)
Vector Relative Error: 0.002380 (PASS)
Cosine Similarity: 0.999997 (PASS)
MSE: 0.0000277403
Max Absolute Diff: 0.054185
Overall: PASS
============================================================
OVERALL: PASS (vector-level relative error < 1% and cosine similarity > 0.999)
============================================================
[INFO] Results saved to /opt/atomgit/vit_base_patch16_224.dino/output精度验证产物保存在 output/ 目录下:
accuracy_result.json:结构化精度对比结果accuracy_log.txt:精度日志文本| 输出项 | 向量相对误差 | 余弦相似度 | MSE | 结论 |
|---|---|---|---|---|
pooled_output | 0.003396 | 0.999994 | 0.0000644438 | PASS |
features | 0.002380 | 0.999997 | 0.0000277403 | PASS |
torch.cuda -> torch.npu)实现 NPU 兼容,未修改 timm 原始库代码,升级 timm 版本时通常无需重新适配。torch.compile,脚本已通过环境变量 TORCH_COMPILE_DISABLE=1 显式禁用,避免潜在兼容性问题。[LOG_WARNING] can not create directory, directory: /home/atomgit/ascend/log,属于 Ascend 驱动日志目录未创建的提示,不影响推理结果,可忽略。/opt/atomgit/weight/vit_base_patch16_224.dino/pytorch_model.bin,请根据实际下载路径修改 inference.py、benchmark.py、accuracy.py 中的 weight_path 变量。