timm/regnety_080_tv.tv2_in1k on Ascend NPU

1. 简介

本工程将 timm/regnety_080_tv.tv2_in1k 图片分类模型适配到华为昇腾 NPU（Ascend910B）上运行。

模型来源：ModelScope 魔搭社区
模型页面：https://www.modelscope.cn/models/timm/regnety_080_tv.tv2_in1k
模型类型：图片分类（ImageNet-1k，1000 类）
适配方式：timm.create_model(pretrained=False) + ModelScope 本地权重加载
预处理：timm 官方 resolve_model_data_config + create_transform

2. 验证环境

组件	版本
NPU 型号	Ascend910B4
CANN	8.5.1
Python	3.11.14
torch	2.9.0+cpu
torchvision	0.24.0
torch_npu	2.9.0.post1+gitee7ba04
transformers	4.57.6
timm	1.0.27
modelscope	1.35.3
safetensors	0.7.0

3. 推理运行

安装依赖

pip install -r requirements.txt

torch_npu 通常由昇腾容器或环境预装，不在 requirements.txt 中固定版本。

运行推理

python inference.py

推理日志：logs/inference.log 预测结果：logs/prediction.txt

运行精度验证

python eval_accuracy.py

日志：logs/accuracy.log

运行性能基准

python benchmark.py

日志：logs/benchmark.log

4. Smoke 验证

torch.npu.is_available(): True
torch.npu.get_device_name(0): Ascend910B4
NPU 推理输出 shape: [1, 1000]
Top-1 预测: class_851 (prob=0.269087)
Top-5 预测:
1. class_851: 0.269087
2. class_782: 0.058281
3. class_664: 0.052266
4. class_717: 0.020666
5. class_837: 0.010222

模型未提供 id2label 映射，使用 class_N 作为占位标签。

5. 性能参考

在 Ascend910B4 单卡上的 benchmark 结果（batch=1，输入 3x224x224）：

指标	数值
avg	19.858 ms
min	17.913 ms
max	24.429 ms
p50	19.046 ms
p90	23.709 ms
p95	23.709 ms
images/sec	50.36

性能数据仅供参考，实际数值受 CANN 版本、驱动、Host 负载等因素影响。

6. 精度评测

CPU vs NPU Smoke 一致性验证结果：

指标	数值
max_abs_diff (logits)	4.34e-03
mean_abs_diff (logits)	4.25e-04
prob_max_diff	1.43e-04
Top-1 match	True
Top-5 match	True

本验证为 CPU/NPU smoke consistency 测试，非官方 ImageNet 数据集精度。

7. 自验证截图

见 screenshots/ 目录：

screenshots/self_verification.txt — 关键日志文本汇总
screenshots/self_verification.png — 验证结果截图

8. 日志文件

文件	说明
`logs/model_check.log`	模型预判与下载检查
`logs/env_check.log`	环境检查（NPU、Python、包版本）
`logs/inference.log`	推理过程日志
`logs/prediction.txt`	Top-5 预测结果
`logs/accuracy.log`	CPU vs NPU 精度对比
`logs/benchmark.log`	性能基准测试

9. 注意事项

权重加载：严禁使用 timm.create_model(..., pretrained=True)，必须通过 ModelScope snapshot_download 下载到本地后加载，避免触发 HuggingFace Hub 自动下载。
权重文件：本地权重为 model.safetensors（151MB）。
预处理：使用 timm 官方 resolve_model_data_config + create_transform，与训练时保持一致。
标签：模型未附带 id2label 文件，因此使用 class_0 ~ class_999 作为占位标签。如需真实 ImageNet 标签名，可自行映射。
显存/内存：模型大小约 151MB，单卡推理显存占用约 3GB，一般不会出现 OOM。
测试图片：assets/test.jpg 来自公开图床 picsum.photos，如网络不可用则使用占位图（见 assets/test_image_note.txt）。
torch_npu 产物：运行后可能产生 fusion_result.json 和 kernel_meta/，已加入 .gitignore，不会提交。

10. 标签

#NPU