timm/vit_small_patch32_224.augreg_in21k on Ascend NPU

1. 简介

本项目将 timm/vit_small_patch32_224.augreg_in21k 图片分类模型适配到昇腾 NPU（Ascend910B）上运行。

模型架构: Vision Transformer (ViT) Small, patch size 32, 224x224
预训练数据: ImageNet-21k (augreg)
输出类别: 21843
加载方式: ModelScope snapshot_download 下载权重，timm.create_model(pretrained=False) 创建结构并加载本地权重
推理框架: PyTorch + torch_npu

2. 验证环境

NPU: Ascend910B
npu-smi: 25.5.2
torch: (当前环境)
torch_npu: (当前环境)

npu_available=True, device=Ascend910_9362

3. 推理运行

pip install -r requirements.txt
python inference.py

推理脚本会：

通过 model_utils.load_model() 从 ModelScope 缓存加载模型和权重
将模型迁移到 npu:0
对 assets/test.jpg 进行预处理并推理
输出 Top-5 预测结果到终端并保存到 logs/inference.log

示例输出：

Output shape: torch.Size([1, 21843])
Top-5 predictions:
  1. class_17516: 0.957400
  2. class_21727: 0.016571
  3. class_20968: 0.001999
  4. class_20467: 0.001391
  5. class_21451: 0.001067

4. 精度验证

python eval_accuracy.py

对单张测试图片进行 CPU 与 NPU 一致性验证：

指标	数值
max_abs_error	0.033958
mean_abs_error	0.005001
relative_error	0.0321%
cosine_similarity	1.000000
threshold	1.0%
结果	PASS

CPU Top-1 与 NPU Top-1 类别一致
CPU Top-5 与 NPU Top-5 类别一致

5. 性能参考

python benchmark.py

指标	数值
Avg latency	5.181 ms
Min latency	5.116 ms
Max latency	5.222 ms
P50 latency	5.183 ms
P90 latency	5.218 ms
P95 latency	5.220 ms
Throughput	193.00 images/sec

测试配置：预热 2 次，正式运行 10 次，batch size 1，输入 224x224。

6. 精度评测

本项目仅提供 CPU-NPU smoke 一致性验证。如需在 ImageNet 等标准数据集上进行官方精度评测，请自行准备数据集并运行标准 timm 评测流程。

7. 自验证截图

见 screenshots/self_verification.png 和 screenshots/self_verification.txt。

8. 日志文件

logs/inference.log — 推理结果
logs/accuracy.log — CPU-NPU 一致性验证
logs/benchmark.log — 性能基准测试
logs/env_check.log — NPU 环境信息

9. 注意事项

权重通过 ModelScope snapshot_download 自动下载到本地缓存，不随工程提交。
严禁使用 timm.create_model(..., pretrained=True) 触发 HuggingFace 直连下载。
模型输出 21843 类（ImageNet-21k），无标准 id2label 映射，推理结果以 class_{index} 形式展示。
工程已配置 .gitignore 排除权重文件（.safetensors、.bin、.pth 等）。

10. 标签

#NPU

1. 简介

本项目将 timm/vit_small_patch32_224.augreg_in21k 图片分类模型适配到昇腾 NPU（Ascend910B）上运行。

模型架构: Vision Transformer (ViT) Small, patch size 32, 224x224

预训练数据: ImageNet-21k (augreg)

输出类别: 21843

加载方式: ModelScope snapshot_download 下载权重，timm.create_model(pretrained=False) 创建结构并加载本地权重

推理框架: PyTorch + torch_npu

3. 推理运行

pip install -r requirements.txt
python inference.py

推理脚本会：

通过 model_utils.load_model() 从 ModelScope 缓存加载模型和权重

将模型迁移到 npu:0

对 assets/test.jpg 进行预处理并推理

输出 Top-5 预测结果到终端并保存到 logs/inference.log

示例输出：

Output shape: torch.Size([1, 21843])
Top-5 predictions:
  1. class_17516: 0.957400
  2. class_21727: 0.016571
  3. class_20968: 0.001999
  4. class_20467: 0.001391
  5. class_21451: 0.001067

指标

数值

max_abs_error

0.033958

mean_abs_error

0.005001

relative_error

0.0321%

cosine_similarity

1.000000

threshold

1.0%

结果

PASS

指标

数值

Avg latency

5.181 ms

Min latency

5.116 ms

Max latency

5.222 ms

P50 latency

5.183 ms

P90 latency

5.218 ms

P95 latency

5.220 ms

Throughput

193.00 images/sec

9. 注意事项

权重通过 ModelScope snapshot_download 自动下载到本地缓存，不随工程提交。

严禁使用 timm.create_model(..., pretrained=True) 触发 HuggingFace 直连下载。

模型输出 21843 类（ImageNet-21k），无标准 id2label 映射，推理结果以 class_{index} 形式展示。

工程已配置 .gitignore 排除权重文件（.safetensors、.bin、.pth 等）。