timm/resnetv2_152x4_bit.goog_in21k_ft_in1k on Ascend NPU

1. 简介

将 timm/resnetv2_152x4_bit.goog_in21k_ft_in1k（4x 宽度 ResNetV2-152 + BiT，ImageNet-21K 预训练 + ImageNet-1K 微调，936.5M 参数）适配到华为昇腾 NPU（Ascend910）。使用 ModelScope snapshot_download 下载权重，timm.create_model(pretrained=False) 加载本地权重，包含推理验证、CPU-NPU 精度一致性检查和性能基准测试。

2. 验证环境

硬件：华为昇腾 910B NPU
OS: Linux 5.10.0
Python: 3.x
PyTorch + torch_npu
npu-smi: Ascend910 OK

3. 推理运行

cd timm-resnetv2_152x4_bit.goog_in21k_ft_in1k-NPU
pip install -r requirements.txt
python download_test_image.py
python inference.py

推理结果（NPU）：

Top-1: class_733 (94.02%)
Top-2: class_557 (1.68%)
Top-3: class_862 (0.59%)
Top-4: class_708 (0.58%)
Top-5: class_919 (0.58%)

4. 精度验证

对单张测试图片进行 CPU 与 NPU 一致性验证：

指标	数值
max_abs_error	0.008362
mean_abs_error	0.001277
relative_error	0.0903%
cosine_similarity	1.000000
threshold	1.0%
结果	PASS

CPU Top-1: class_733
NPU Top-1: class_733
CPU Top-5: [733, 557, 862, 708, 919]
NPU Top-5: [733, 557, 862, 708, 919]
Top-1 match: True
Top-5 match: True

5. 性能参考

指标	数值
avg latency	80.91 ms
min latency	77.69 ms
max latency	99.77 ms
p50 latency	77.81 ms
p90 latency	85.63 ms
p95 latency	92.70 ms
throughput	12.36 images/sec

6. 精度评测说明

本项目包含单图 smoke consistency 验证，非官方 ImageNet 完整验证集评测。详细指标见第 4 节。

7. 自验证截图

参见 screenshots/self_verification.png

8. 日志文件

logs/inference.log - 推理结果
logs/accuracy.log - 精度验证
logs/benchmark.log - 性能测试
logs/env_check.log - 环境检查

9. 注意事项

模型参数量 936.5M，权重文件约 3.5GB，下载耗时约 7 分钟
使用 ModelScope snapshot_download 作为主下载方式
不使用 HuggingFace 直连下载
不提交权重文件（*.safetensors, *.bin）

10. 标签

#NPU #Ascend #Ascend910 #ImageClassification #ResNetV2 #BiT

1. 简介

4. 精度验证

对单张测试图片进行 CPU 与 NPU 一致性验证：

指标	数值
max_abs_error	0.008362
mean_abs_error	0.001277
relative_error	0.0903%
cosine_similarity	1.000000
threshold	1.0%
结果	PASS

CPU Top-1: class_733

NPU Top-1: class_733

CPU Top-5: [733, 557, 862, 708, 919]

NPU Top-5: [733, 557, 862, 708, 919]

Top-1 match: True

Top-5 match: True

指标

数值

avg latency

80.91 ms

min latency

77.69 ms

max latency

99.77 ms

p50 latency

77.81 ms

p90 latency

85.63 ms

p95 latency

92.70 ms

throughput

12.36 images/sec