m0_74196153/tiny_vit_21m_512_dist_in22k_ft_in1k_npu

tiny_vit_21m_512.dist_in22k_ft_in1k 昇腾 NPU 适配

1. 模型介绍

tiny_vit_21m_512.dist_in22k_ft_in1k 是基于 TinyViT (Tiny Vision Transformer) 架构的图像分类模型。

模型名称: tiny_vit_21m_512.dist_in22k_ft_in1k
模型架构: TinyViT
参数量: 21.3M
输入尺寸: 512x512
分类类别数: 1000
训练数据集: ImageNet-22K + 1K 微调

原始模型地址

HuggingFace: https://huggingface.co/timm/tiny_vit_21m_512.dist_in22k_ft_in1k
ModelScope: https://www.modelscope.cn/models/timm/tiny_vit_21m_512.dist_in22k_ft_in1k

任务类型

图像分类

输入格式

RGB 图像，512x512，归一化 mean=[0.485, 0.456, 0.406]，std=[0.229, 0.224, 0.225]

输出格式

1000 类 logits，通过 Softmax 转换为概率

2. 验证环境

组件	版本
NPU	Ascend910
CANN	25.5.2
PyTorch	2.9.0
torch_npu	2.9.0.post1+gitee7ba04
timm	1.0.27

3. NPU 适配说明

使用 ModelScope 下载权重，通过 torch_npu 加载到 NPU，FP32 推理，无需修改模型代码。

4. 环境准备

pip install torch torchvision timm modelscope safetensors Pillow

5. 推理命令

# CPU 推理
python inference.py --device cpu

# NPU 推理
python inference.py --device npu

# 精度对比
python compare_cpu_npu.py

6. 推理结果

指标	CPU	NPU
平均推理耗时	1820.63 ms	10.09 ms
加速比	-	180.50x

7. CPU/NPU 精度测试

Top-5 预测对比

排名	CPU 类别	CPU 概率	NPU 类别	NPU 概率
1	974	0.004109	974	0.004144
2	405	0.004050	405	0.004068
3	700	0.003918	700	0.003931
4	680	0.003884	680	0.003885
5	769	0.003815	769	0.003833

精度指标

指标	值
Logits 最大绝对误差	1.2600e-02
Logits 平均绝对误差	2.6800e-03
概率最大绝对误差	3.6900e-05
概率平均绝对误差	2.8100e-06
余弦相似度	0.99998385
概率相对误差	0.3666%
Top-1 类别匹配	是

结论

NPU 与 CPU 推理结果误差 < 1%，精度对齐通过。余弦相似度为 0.99998385，Top-1 类别完全一致。

8. 截图

terminal output

9. 代码示例

from timm import create_model
from modelscope import snapshot_download
from safetensors.torch import load_file
model = create_model('tiny_vit_21m_512.dist_in22k_ft_in1k', pretrained=False)
local_path = snapshot_download('timm/tiny_vit_21m_512.dist_in22k_ft_in1k')
state_dict = load_file(local_path + '/model.safetensors')
model.load_state_dict(state_dict, strict=False)
model = model.to('npu:0').float()

10. 依赖

torch>=2.0.0，torchvision>=0.15.0，timm>=1.0.0，modelscope>=1.0.0，safetensors，Pillow

11. 标签

#+NPU #+CV #+图像分类 #+昇腾 #+TinyViT #+timm