本文档记录 google/vit-base-patch16-224-in21k 模型在昇腾 NPU 上的适配与验证结果。
vit-base-patch16-224-in21k 是 Google 发布的 Vision Transformer 模型(ViT-Base,补丁大小 16),基于 ImageNet-21k 数据集预训练,输入分辨率为 224×224。该模型适用于图像分类、特征提取等下游任务。
| 组件 | 版本 |
|---|---|
torch | 2.9.0+cpu |
torch-npu | 2.9.0.post1+gitee7ba04 |
transformers | >=4.30.0 |
Pillow | >=9.0.0 |
numpy | 1.26.4 |
Ascend910/opt/atomgit/weight/vit-base-patch16-224-in21ksample_image.jpg(COCO val2017 真实样本,已包含在仓库中)本仓库用于推理验证的样本图片为 MS COCO 2017 验证集中的 000000000139.jpg。该图片在仓库中以 sample_image.jpg 命名提交,与 COCO val2017 原图 MD5 完全一致(a0204aa65acc51cd8ffc128e5e94a05c),尺寸 640 × 426,RGB 格式,JPEG 编码,大小约 158 KB。由于 data/ 目录被 .gitignore 排除(避免提交大型数据集),因此将单张样本图以副本形式提交到根目录,确保克隆后即可直接运行。
pip install torch==2.9.0 transformers Pillow numpy注:本环境已预装
torch-npu,若在新环境部署,请参考 CANN 官方安装指南 安装对应版本的 CANN 驱动与固件。
python3 -m atomgit download hf_mirrors/google/vit-base-patch16-224-in21k -d /opt/atomgit/weight/vit-base-patch16-224-in21k下载完成后,目录结构应为:
/opt/atomgit/weight/vit-base-patch16-224-in21k/
├── config.json
├── model.safetensors
└── pytorch_model.binpython3 inference.pymodel.to("npu") 将模型加载到 NPU 执行推理000000000139.jpg),非随机生成[INFO] Applied NPU monkey-patch (torch.cuda -> torch.npu)
[INFO] Loading image from: /opt/atomgit/vit-base-patch16-224-in21k/sample_image.jpg
[INFO] Loading model from: /opt/atomgit/weight/vit-base-patch16-224-in21k
Fast image processor class <class 'transformers.models.vit.image_processing_vit_fast.ViTImageProcessorFast'> is available for this model. Using slow image processor class. To use the fast image processor class set `use_fast=True`.
[INFO] Model moved to NPU device: Ascend910_9362
[INFO] Input shape: torch.Size([1, 3, 224, 224])
[INFO] Device: npu:0
[INFO] Warm-up inference...
[INFO] Running timed inference...
============================================================
Inference Results
============================================================
Device: NPU (Ascend910_9362)
Latency: 5.91 ms
Pooled output shape: torch.Size([1, 768])
Last hidden state shape: torch.Size([1, 197, 768])
Pooled output dtype: torch.float32
Pooled output device: npu:0
Pooled output first 5 values: [ 0.4473943 0.03563369 -0.14180312 -0.01393258 -0.37608528]
============================================================
[INFO] Results saved to: /opt/atomgit/vit-base-patch16-224-in21k/output/inference_result.json
[INFO] Log saved to: /opt/atomgit/vit-base-patch16-224-in21k/output/inference_log.txt推理产物保存在 output/ 目录下:
inference_result.json:结构化推理结果(含输出张量形状、样本值、延迟等)inference_log.txt:推理日志文本python3 benchmark.py[INFO] Applied NPU monkey-patch (torch.cuda -> torch.npu)
[INFO] Loading image from: /opt/atomgit/vit-base-patch16-224-in21k/sample_image.jpg
[INFO] Loading model from: /opt/atomgit/weight/vit-base-patch16-224-in21k
Fast image processor class <class 'transformers.models.vit.image_processing_vit_fast.ViTImageProcessorFast'> is available for this model. Using slow image processor class. To use the fast image processor class set `use_fast=True`.
[INFO] Model moved to NPU: Ascend910_9362
[INFO] Benchmarking latency (5 warmup + 20 timed)...
============================================================
Latency Benchmark Results
============================================================
Device: NPU (Ascend910_9362)
Iterations: 20
Mean: 5.58 ms
StdDev: 0.14 ms
Min: 5.28 ms
Max: 5.76 ms
P50: 5.63 ms
P90: 5.74 ms
P99: 5.76 ms
============================================================
[INFO] Benchmarking throughput for batch sizes: [1, 2, 4, 8]
batch=1: 187.67 images/sec, avg_latency=5.33 ms
batch=2: 379.29 images/sec, avg_latency=5.27 ms
batch=4: 709.57 images/sec, avg_latency=5.64 ms
batch=8: 982.54 images/sec, avg_latency=8.14 ms
============================================================
Throughput Benchmark Results
============================================================
Batch= 1: 187.67 img/s | avg_latency= 5.33 ms
Batch= 2: 379.29 img/s | avg_latency= 5.27 ms
Batch= 4: 709.57 img/s | avg_latency= 5.64 ms
Batch= 8: 982.54 img/s | avg_latency= 8.14 ms
============================================================
[INFO] Results saved to /opt/atomgit/vit-base-patch16-224-in21k/output性能产物保存在 output/ 目录下:
benchmark_result.json:结构化性能结果(延迟分布、吞吐数据)benchmark_log.txt:性能日志文本| 指标 | 数值 |
|---|---|
| 平均延迟 | 5.58 ms |
| 延迟 P50 | 5.63 ms |
| 延迟 P90 | 5.74 ms |
| 延迟 P99 | 5.76 ms |
| 吞吐量(bs=1) | 187.67 img/s |
| 吞吐量(bs=2) | 379.29 img/s |
| 吞吐量(bs=4) | 709.57 img/s |
| 吞吐量(bs=8) | 982.54 img/s |
python3 accuracy.py< 1%,余弦相似度 > 0.999[INFO] Applied NPU monkey-patch (torch.cuda -> torch.npu)
[INFO] Loading image from: /opt/atomgit/vit-base-patch16-224-in21k/sample_image.jpg
[INFO] Loading model from: /opt/atomgit/weight/vit-base-patch16-224-in21k
Fast image processor class <class 'transformers.models.vit.image_processing_vit_fast.ViTImageProcessorFast'> is available for this model. Using slow image processor class. To use the fast image processor class set `use_fast=True`.
[INFO] Running CPU baseline inference...
[INFO] CPU baseline done.
[INFO] Running NPU inference...
[INFO] NPU inference done.
============================================================
Accuracy Validation Results
============================================================
pooler_output:
Shape: (1, 768)
Vector Relative Error: 0.004624 (PASS)
Cosine Similarity: 0.999990 (PASS)
MSE: 0.0000030698
Max Absolute Diff: 0.005724
Overall: PASS
last_hidden_state:
Shape: (1, 197, 768)
Vector Relative Error: 0.005795 (PASS)
Cosine Similarity: 0.999983 (PASS)
MSE: 0.0000012771
Max Absolute Diff: 0.006362
Overall: PASS
============================================================
OVERALL: PASS (vector-level relative error < 1% and cosine similarity > 0.999)
============================================================
[INFO] Results saved to /opt/atomgit/vit-base-patch16-224-in21k/output精度验证产物保存在 output/ 目录下:
accuracy_result.json:结构化精度对比结果accuracy_log.txt:精度日志文本| 输出项 | 向量相对误差 | 余弦相似度 | MSE | 结论 |
|---|---|---|---|---|
pooler_output | 0.004624 | 0.999990 | 0.0000030698 | PASS |
last_hidden_state | 0.005795 | 0.999983 | 0.0000012771 | PASS |
torch.cuda -> torch.npu)实现 NPU 兼容,未修改 transformers 原始库代码,升级 transformers 版本时通常无需重新适配。torch.compile,脚本已通过环境变量 TORCH_COMPILE_DISABLE=1 显式禁用,避免潜在兼容性问题。[LOG_WARNING] can not create directory, directory: /home/atomgit/ascend/log,属于 Ascend 驱动日志目录未创建的提示,不影响推理结果,可忽略。/opt/atomgit/weight/vit-base-patch16-224-in21k,请根据实际下载路径修改 inference.py、benchmark.py、accuracy.py 中的 model_path 变量。