timm/tf_efficientnet_lite1.in1k on Ascend NPU

1. 简介

本项目将 timm/tf_efficientnet_lite1.in1k 图片分类模型适配到华为昇腾 NPU (Ascend910) 单卡环境。使用 ModelScope snapshot_download 下载权重，通过 timm.create_model(pretrained=False) 加载本地权重，不依赖 HuggingFace 直连。

2. 验证环境

设备：华为昇腾 NPU (Ascend910)
框架：PyTorch + torch_npu
模型：tf_efficientnet_lite1 (EfficientNet-Lite 系列)
输入尺寸：[1, 3, 240, 240]
输出尺寸：[1, 1000]

3. 推理运行

cd timm-tf_efficientnet_lite1.in1k-NPU
pip install -r requirements.txt
python inference.py

推理结果 (NPU Top-5):

class_977 (0.11%)
class_979 (0.11%)
class_785 (0.11%)
class_649 (0.11%)
class_973 (0.11%)

4. 精度验证

对单张测试图片进行 CPU 与 NPU 一致性验证：

指标	数值
max_abs_error	0.000379
mean_abs_error	0.000088
relative_error	0.2726%
cosine_similarity	0.999998
threshold	1.0%
结果	PASS

CPU Top-1: class_977
NPU Top-1: class_977
CPU Top-5: class_977, class_979, class_785, class_649, class_973
NPU Top-5: class_977, class_979, class_785, class_649, class_973
Top-1 match: True
Top-5 match: True

5. 性能参考

指标	数值
avg latency	6.72 ms
min latency	6.68 ms
max latency	6.78 ms
p50 latency	6.72 ms
p90 latency	6.77 ms
p95 latency	6.77 ms
throughput	148.81 images/sec

6. 精度评测说明

本项目包含单图 smoke consistency 验证，非官方 ImageNet 完整验证集评测。详细指标见第 4 节。

7. 自验证截图

详见 screenshots/self_verification.png

8. 日志文件

logs/inference.log — 推理结果日志
logs/accuracy.log — 精度验证日志
logs/benchmark.log — 性能基准测试日志

9. 注意事项

使用 ModelScope snapshot_download 下载权重，不使用 HuggingFace 直连
timm.create_model(pretrained=False) + 本地权重加载
本项目不提交任何权重文件 (.bin, .safetensors, .pth, .pt, .ckpt, .onnx)
输入尺寸为 240x240 (EfficientNet-Lite 标准输入)

10. 标签 #NPU

4. 精度验证

对单张测试图片进行 CPU 与 NPU 一致性验证：

指标	数值
max_abs_error	0.000379
mean_abs_error	0.000088
relative_error	0.2726%
cosine_similarity	0.999998
threshold	1.0%
结果	PASS

CPU Top-1: class_977

NPU Top-1: class_977

CPU Top-5: class_977, class_979, class_785, class_649, class_973

NPU Top-5: class_977, class_979, class_785, class_649, class_973

Top-1 match: True

Top-5 match: True

指标

数值

avg latency

6.72 ms

min latency

6.68 ms

max latency

6.78 ms

p50 latency

6.72 ms

p90 latency

6.77 ms

p95 latency

6.77 ms

throughput

148.81 images/sec