timm/deit_tiny_distilled_patch16_224.fb_in1k on Ascend NPU

1. 简介

本工程将 timm/deit_tiny_distilled_patch16_224.fb_in1k 图片分类模型适配到单卡昇腾 NPU (Ascend910)。

模型类型: Vision Transformer (DeiT Tiny Distilled)
输入尺寸: 224x224
输出类别: 1000 (ImageNet-1k)
适配方式: ModelScope snapshot_download 下载权重，timm.create_model(pretrained=False) 创建模型结构并加载本地权重

2. 验证环境

项目	版本/型号
NPU	Ascend910
npu-smi	25.5.2
PyTorch NPU	可用
设备名	Ascend910_9362

3. 推理运行

pip install -r requirements.txt
python inference.py

推理脚本使用 model_utils.py 统一加载模型，通过 timm.data.resolve_model_data_config 自动解析预处理参数，在 npu:0 上执行真实推理。

推理结果示例:

输入 shape: torch.Size([1, 3, 224, 224])
输出 shape: torch.Size([1, 1000])
Top-5 预测:
  class_978: 0.4575
  class_976: 0.2227
  class_972: 0.0433
  class_460: 0.0287
  class_974: 0.0260

4. 精度验证

python eval_accuracy.py

对单张测试图片进行 CPU 与 NPU 一致性验证：

指标	数值
max_abs_error	0.023488
mean_abs_error	0.005529
relative_error	0.4294%
cosine_similarity	0.999992
threshold	1.0%
结果	PASS

CPU Top-1 与 NPU Top-1 类别一致
CPU Top-5 与 NPU Top-5 类别一致

5. 性能参考

python benchmark.py

指标	数值
平均耗时	5.74 ms
最小耗时	5.59 ms
最大耗时	5.86 ms
P50 耗时	5.75 ms
P90 耗时	5.81 ms
P95 耗时	5.83 ms
吞吐量	174.34 images/sec

6. 精度评测

本仓库仅提供 CPU-NPU smoke consistency 验证。完整 ImageNet-1k 精度评测需使用标准数据集另行计算。

7. 自验证截图

见 screenshots/self_verification.png 和 screenshots/self_verification.txt。

8. 日志文件

文件	内容
`logs/inference.log`	NPU 推理结果
`logs/accuracy.log`	CPU-NPU 一致性对比
`logs/benchmark.log`	性能基准测试
`logs/env_check.log`	NPU 环境检查
`logs/paths.txt`	模型与权重路径

9. 注意事项

权重文件未提交到仓库，首次运行时会通过 ModelScope snapshot_download 自动下载到本地缓存。
严禁使用 timm.create_model(..., pretrained=True) 进行 HuggingFace 直连下载。
不 fallback，不提交权重。

10. 标签

#NPU

1. 简介

本工程将 timm/deit_tiny_distilled_patch16_224.fb_in1k 图片分类模型适配到单卡昇腾 NPU (Ascend910)。

模型类型: Vision Transformer (DeiT Tiny Distilled)

输入尺寸: 224x224

输出类别: 1000 (ImageNet-1k)

适配方式: ModelScope snapshot_download 下载权重，timm.create_model(pretrained=False) 创建模型结构并加载本地权重

项目

版本/型号

NPU

Ascend910

npu-smi

25.5.2

PyTorch NPU

可用

设备名

Ascend910_9362

3. 推理运行

pip install -r requirements.txt
python inference.py

推理脚本使用 model_utils.py 统一加载模型，通过 timm.data.resolve_model_data_config 自动解析预处理参数，在 npu:0 上执行真实推理。

推理结果示例:

输入 shape: torch.Size([1, 3, 224, 224])
输出 shape: torch.Size([1, 1000])
Top-5 预测:
  class_978: 0.4575
  class_976: 0.2227
  class_972: 0.0433
  class_460: 0.0287
  class_974: 0.0260

指标

数值

max_abs_error

0.023488

mean_abs_error

0.005529

relative_error

0.4294%

cosine_similarity

0.999992

threshold

1.0%

结果

PASS

指标

数值

平均耗时

5.74 ms

最小耗时

5.59 ms

最大耗时

5.86 ms

P50 耗时

5.75 ms

P90 耗时

5.81 ms

P95 耗时

5.83 ms

吞吐量

174.34 images/sec

文件

内容

logs/inference.log

NPU 推理结果

logs/accuracy.log

CPU-NPU 一致性对比

logs/benchmark.log

性能基准测试

logs/env_check.log

NPU 环境检查

logs/paths.txt

模型与权重路径