timm/convnextv2_huge.fcmae_ft_in22k_in1k_512 on Ascend NPU

1. 简介

ConvNeXt-V2 Huge 预训练模型，基于 FCMAE（Fully Convolutional Masked Autoencoder）在 ImageNet-22K 上预训练，并微调至 ImageNet-1K。输入分辨率 512x512，参数量 660.3M。

2. 验证环境

硬件：华为昇腾 910B NPU (Ascend910_9362)
框架：PyTorch + torch_npu
权重来源：ModelScope snapshot_download
加载方式：timm.create_model(pretrained=False) + 本地权重加载

3. 推理运行

pip install -r requirements.txt
python inference.py

推理输出 [1, 1000] logits，Top-5 预测：

排名	类别	概率
1	class_634	0.1564
2	class_416	0.1471
3	class_839	0.0364
4	class_491	0.0272
5	class_608	0.0198

4. 精度验证

对单张测试图片进行 CPU 与 NPU 一致性验证：

指标	数值
max_abs_error	0.014772
mean_abs_error	0.002292
relative_error	0.3217%
cosine_similarity	0.999995
threshold	1.0%
结果	PASS

CPU Top-1: class_634
NPU Top-1: class_634
CPU Top-5: class_634, class_416, class_839, class_491, class_608
NPU Top-5: class_634, class_416, class_839, class_491, class_608
Top-1 match: True
Top-5 match: True

5. 性能参考

指标	数值
输入尺寸	1x3x512x512
平均延迟	61.78 ms
最小延迟	61.70 ms
最大延迟	61.87 ms
P50 延迟	61.77 ms
P90 延迟	61.87 ms
P95 延迟	61.87 ms
吞吐量	16.19 images/sec
测试次数	10

6. 精度评测说明

本项目包含单图 smoke consistency 验证，非官方 ImageNet 完整验证集评测。详细指标见第 4 节。

7. 自验证截图

见 screenshots/self_verification.png

8. 日志文件

logs/inference.log — 推理结果
logs/accuracy.log — CPU-NPU 精度一致性检查
logs/benchmark.log — 性能基准测试

9. 注意事项

模型规模极大（660.3M 参数，512 输入分辨率），加载和推理需较多显存
使用 timm.data.resolve_model_data_config 自动生成预处理，无需手动指定 mean/std
权重通过 ModelScope 下载，不使用 HuggingFace 直连
未提交权重文件（.bin/.safetensors/.pth/.pt/.ckpt/.onnx）

10. 标签

#NPU

排名

类别

概率

class_634

0.1564

class_416

0.1471

class_839

0.0364

class_491

0.0272

class_608

0.0198

4. 精度验证

对单张测试图片进行 CPU 与 NPU 一致性验证：

指标	数值
max_abs_error	0.014772
mean_abs_error	0.002292
relative_error	0.3217%
cosine_similarity	0.999995
threshold	1.0%
结果	PASS

CPU Top-1: class_634

NPU Top-1: class_634

CPU Top-5: class_634, class_416, class_839, class_491, class_608

NPU Top-5: class_634, class_416, class_839, class_491, class_608

Top-1 match: True

Top-5 match: True

指标

数值

输入尺寸

1x3x512x512

平均延迟

61.78 ms

最小延迟

61.70 ms

最大延迟

61.87 ms

P50 延迟

61.77 ms

P90 延迟

61.87 ms

P95 延迟

61.87 ms

吞吐量

16.19 images/sec

测试次数

9. 注意事项

模型规模极大（660.3M 参数，512 输入分辨率），加载和推理需较多显存

使用 timm.data.resolve_model_data_config 自动生成预处理，无需手动指定 mean/std

权重通过 ModelScope 下载，不使用 HuggingFace 直连

未提交权重文件（.bin/.safetensors/.pth/.pt/.ckpt/.onnx）