本项目将 timm/tf_efficientnet_b3.ap_in1k 图像分类模型适配到华为昇腾 NPU(Ascend910)。通过 ModelScope snapshot_download 下载权重,使用 timm.create_model(pretrained=False) 创建模型结构并加载本地权重,包含推理验证、CPU-NPU 精度一致性检查以及性能基准测试。
python inference.py输出 Top-5 预测类别及概率,日志写入 logs/inference.log。
=== Inference Result ===
Model: timm/tf_efficientnet_b3.ap_in1k
Device: npu:0
Output shape: torch.Size([1, 1000])
Top-1: class_405 (prob=0.0057)
Top-2: class_600 (prob=0.0049)
Top-3: class_701 (prob=0.0048)
Top-4: class_623 (prob=0.0045)
Top-5: class_895 (prob=0.0044)
Full logits (first 10): [-0.45889613032341003, 0.050226159393787384, 0.22011253237724304, 0.11555498093366623, 0.4635002613067627, 0.12647953629493713, -0.08388809859752655, -0.07328705489635468, -0.3965292274951935, 0.3011126220226288]对单张测试图片进行 CPU 与 NPU 一致性验证:
| 指标 | 数值 |
|---|---|
| max_abs_error | 0.001079 |
| mean_abs_error | 0.000197 |
| relative_error | 0.0467% |
| cosine_similarity | 1.000000 |
| threshold | 1.0% |
| 结果 | PASS |
| 指标 | 数值 |
|---|---|
| avg_latency | 11.85 ms |
| min_latency | 11.50 ms |
| max_latency | 12.11 ms |
| p50_latency | 11.87 ms |
| p90_latency | 12.11 ms |
| p95_latency | 12.11 ms |
| throughput | 84.37 images/sec |
本项目包含单图 smoke consistency 验证,非官方 ImageNet 完整验证集评测。详细指标见第 4 节。
见 screenshots/self_verification.png。
logs/inference.log — 推理结果logs/accuracy.log — 精度一致性检查logs/benchmark.log — 性能基准测试logs/env_check.log — 环境检查#NPU