timm/vit_base_patch16_224.augreg_in1k on Ascend NPU

1. 简介

本项目将 timm/vit_base_patch16_224.augreg_in1k 适配至华为昇腾 NPU（Ascend910），实现单卡推理、精度一致性验证与性能基准测试。

模型来源：ModelScope timm/vit_base_patch16_224.augreg_in1k
框架：timm
权重加载：timm.create_model(pretrained=False) + ModelScope snapshot_download 本地权重加载
输入尺寸：224x224
输出维度：1000 类 ImageNet 分类 logits

2. 验证环境

项目	版本/型号
NPU	Ascend910
npu-smi	25.5.2
PyTorch	2.9.0+cpu
torch_npu	available
timm	latest

3. 推理运行

pip install -r requirements.txt
python inference.py

NPU 推理结果（单张测试图）：

项目	数值
Input shape	[1, 3, 224, 224]
Output shape	[1, 1000]
Top-1	class_21 (0.203968)
Top-2	class_22 (0.121386)
Top-3	class_128 (0.109912)
Top-4	class_23 (0.099468)
Top-5	class_127 (0.059127)

4. 精度验证

对单张测试图片进行 CPU 与 NPU 一致性验证：

指标	数值
max_abs_error	0.013841
mean_abs_error	0.002110
relative_error	0.2032%
cosine_similarity	0.999998
threshold	1.0%
结果	PASS

CPU Top-1: class_21
NPU Top-1: class_21
CPU Top-5: class_21, class_22, class_128, class_23, class_127
NPU Top-5: class_21, class_22, class_128, class_23, class_127
Top-1 match: True
Top-5 match: True

5. 性能参考

python benchmark.py

指标	数值
avg latency	5.282 ms
min latency	5.220 ms
max latency	5.372 ms
p50 latency	5.265 ms
p90 latency	5.372 ms
p95 latency	5.372 ms
Throughput	189.30 images/sec

6. 精度评测说明

本项目包含单图 smoke consistency 验证，非官方 ImageNet 完整验证集评测。详细指标见第 4 节。

7. 自验证截图

见 screenshots/self_verification.png 与 screenshots/self_verification.txt。

8. 日志文件

日志	说明
`logs/env_check.log`	NPU 环境检查
`logs/inference.log`	NPU 推理结果
`logs/accuracy.log`	CPU-NPU 精度一致性
`logs/benchmark.log`	性能基准测试

9. 注意事项

权重通过 ModelScope snapshot_download 下载，运行时自动缓存到本地，无需手动放置。
严禁使用 timm.create_model(..., pretrained=True) 进行 HuggingFace 自动下载。
请勿将 *.bin, *.safetensors, *.pth, *.pt, *.ckpt, *.onnx 等权重文件提交到仓库。
首次运行会触发 ModelScope 下载，耗时取决于网络速度。

10. 标签

#NPU #Ascend #Ascend910 #timm #ViT #ImageNet

1. 简介

本项目将 timm/vit_base_patch16_224.augreg_in1k 适配至华为昇腾 NPU（Ascend910），实现单卡推理、精度一致性验证与性能基准测试。

模型来源：ModelScope timm/vit_base_patch16_224.augreg_in1k

框架：timm

权重加载：timm.create_model(pretrained=False) + ModelScope snapshot_download 本地权重加载

输入尺寸：224x224

输出维度：1000 类 ImageNet 分类 logits

项目

版本/型号

NPU

Ascend910

npu-smi

25.5.2

PyTorch

2.9.0+cpu

torch_npu

available

timm

latest

项目

数值

Input shape

[1, 3, 224, 224]

Output shape

[1, 1000]

Top-1

class_21 (0.203968)

Top-2

class_22 (0.121386)

Top-3

class_128 (0.109912)

Top-4

class_23 (0.099468)

Top-5

class_127 (0.059127)

4. 精度验证

对单张测试图片进行 CPU 与 NPU 一致性验证：

指标	数值
max_abs_error	0.013841
mean_abs_error	0.002110
relative_error	0.2032%
cosine_similarity	0.999998
threshold	1.0%
结果	PASS

CPU Top-1: class_21

NPU Top-1: class_21

CPU Top-5: class_21, class_22, class_128, class_23, class_127

NPU Top-5: class_21, class_22, class_128, class_23, class_127

Top-1 match: True

Top-5 match: True

指标

数值

avg latency

5.282 ms

min latency

5.220 ms

max latency

5.372 ms

p50 latency

5.265 ms

p90 latency

5.372 ms

p95 latency

5.372 ms

Throughput

189.30 images/sec

日志

说明

logs/env_check.log

NPU 环境检查

logs/inference.log

NPU 推理结果

logs/accuracy.log

CPU-NPU 精度一致性

logs/benchmark.log

性能基准测试

9. 注意事项

权重通过 ModelScope snapshot_download 下载，运行时自动缓存到本地，无需手动放置。

严禁使用 timm.create_model(..., pretrained=True) 进行 HuggingFace 自动下载。

请勿将 *.bin, *.safetensors, *.pth, *.pt, *.ckpt, *.onnx 等权重文件提交到仓库。

首次运行会触发 ModelScope 下载，耗时取决于网络速度。