timm/vit_base_patch32_clip_384.openai_ft_in12k_in1k on Ascend NPU

1. 简介

模型: timm/vit_base_patch32_clip_384.openai_ft_in12k_in1k
架构: ViT-Base with patch size 32, input resolution 384x384
预训练: CLIP (OpenAI), fine-tuned on ImageNet-12k then ImageNet-1k
分类头: 标准 timm 分类头 (num_classes=1000)
权重来源: ModelScope snapshot_download
适配方式: timm.create_model(pretrained=False) + 本地 safetensors 权重加载
设备: 单卡 Ascend910B (Ascend910_9362)

2. 验证环境

项目	值
NPU	Ascend910_9362
PyTorch	torch + torch_npu
timm	Latest
输入尺寸	384 x 384
输出维度	[1, 1000]

3. 推理运行

pip install -r requirements.txt
python inference.py

输出 Top-5 预测:

排名	类别	概率
1	class_814	12.37%
2	class_576	10.92%
3	class_914	8.58%
4	class_978	7.87%
5	class_975	7.67%

4. 精度验证

python eval_accuracy.py

对单张测试图片进行 CPU 与 NPU 一致性验证：

指标	数值
max_abs_error	0.016526
mean_abs_error	0.003962
relative_error	0.5604%
cosine_similarity	0.999987
threshold	1.0%
结果	PASS

CPU Top-1 与 NPU Top-1 类别一致
CPU Top-5 与 NPU Top-5 类别一致

5. 性能参考

指标	值
Avg latency	5.61 ms
Min latency	5.43 ms
Max latency	5.68 ms
P50	5.64 ms
P90	5.68 ms
P95	5.68 ms
Throughput	178.19 images/s

测试条件: warmup 2 次 + 正式 10 次, batch=1, 单卡 NPU。

6. 精度评测

本工程仅提供 smoke consistency 验证 (CPU vs NPU logit 对比)，不做官方 ImageNet 精度评测。如需精度评测，请使用 ImageNet 验证集进行完整评估。

7. 自验证截图

见 screenshots/self_verification.png 和 screenshots/self_verification.txt。

8. 日志文件

文件	说明
`logs/inference.log`	推理结果日志
`logs/accuracy.log`	精度一致性日志
`logs/benchmark.log`	性能基准日志
`logs/env_check.log`	环境检查日志

9. 注意事项

权重文件 (*.safetensors, *.bin) 已通过 .gitignore 排除，不提交到仓库
模型通过 ModelScope snapshot_download 下载，不使用 HuggingFace 直连
pretrained=False 确保不触发自动下载
输入分辨率 384x384，比标准 224x224 更大，推理延迟相应更高

10. 标签

#NPU

1. 简介

模型: timm/vit_base_patch32_clip_384.openai_ft_in12k_in1k

架构: ViT-Base with patch size 32, input resolution 384x384

预训练: CLIP (OpenAI), fine-tuned on ImageNet-12k then ImageNet-1k

分类头: 标准 timm 分类头 (num_classes=1000)

权重来源: ModelScope snapshot_download

适配方式: timm.create_model(pretrained=False) + 本地 safetensors 权重加载

设备: 单卡 Ascend910B (Ascend910_9362)

项目

值

NPU

Ascend910_9362

PyTorch

torch + torch_npu

timm

Latest

输入尺寸

384 x 384

输出维度

[1, 1000]

排名

类别

概率

class_814

12.37%

class_576

10.92%

class_914

8.58%

class_978

7.87%

class_975

7.67%

指标

数值

max_abs_error

0.016526

mean_abs_error

0.003962

relative_error

0.5604%

cosine_similarity

0.999987

threshold

1.0%

结果

PASS

指标

值

Avg latency

5.61 ms

Min latency

5.43 ms

Max latency

5.68 ms

P50

5.64 ms

P90

5.68 ms

P95

5.68 ms

Throughput

178.19 images/s

文件

说明

logs/inference.log

推理结果日志

logs/accuracy.log

精度一致性日志

logs/benchmark.log

性能基准日志

logs/env_check.log

环境检查日志