mobilenetv3_large_100.ra_in1k on Ascend NPU

1. 简介

本文档记录 mobilenetv3_large_100.ra_in1k MobileNetV3-Large 图像分类模型在昇腾 NPU（Ascend 910B3）上的迁移适配、精度评测与性能验证结果。

MobileNetV3-Large 是 MobileNetV3 系列中精度最高的变体（约 5.5M 参数，width=1.0），使用 RA（RandAugment）数据增强在 ImageNet-1k 上训练。相比 Small 系列，Large 增加了通道数和层数，精度显著提升（Top-1 约 75% vs Small 1.0 的 67%）。以 timm 格式发布，采用 timm.create_model() 加载。

2. 验证环境

组件	版本
`torch`	`2.8.0`
`torch_npu`	`2.8.0.post4`
`timm`	`1.0.27`
`CANN`	`8.5.1`

NPU：8 × Ascend 910B3

3. 部署

conda create -n timm-models python=3.11 -y && conda activate timm-models
pip install torch==2.8.0 torch_npu==2.8.0.post4 timm safetensors pillow numpy -i https://pypi.tuna.tsinghua.edu.cn/simple
python inference.py --model_path ./mobilenetv3_large_100.ra_in1k --image photo.jpg --device npu

4. 性能

测试条件：6 张合成 224×224 图像，batch_size=8，NPU 预热 1 轮。

指标	数值
NPU 吞吐量	`93.0` img/s

Large 版（5.5M）推理速度略慢于 Small 1.0（2.5M），但仍适合实时推理。

5. 精度

指标	数值
平均余弦相似度	`0.999998`
MAE	`0.000001`
精度误差率	`0.0002%`
Top-1 准确率	`100.0%`

结论：精度误差率 0.0002%，评测通过。

6. 迁移适配说明

6.1 模型结构

Backbone：MobileNetV3-Large（width=1.0，约 5.5M 参数）
Head：GAP + Conv 1×1 + FC → 1000
输入：224×224 RGB
增强：RandAugment（RA）训练

6.2 适配代码

import timm, json; from safetensors.torch import load_file
with open("config.json") as f: cfg = json.load(f)
model = timm.create_model(cfg["architecture"], pretrained=False, num_classes=cfg["num_classes"])
model.load_state_dict(load_file("model.safetensors"), strict=False)
model.to("npu:0").eval()

7. 注意事项

Large vs Small：Large (5.5M) 精度显著高于 Small (2.5M)，适合对精度要求较高的移动端场景
RA 增强：RandAugment 训练的模型对图像质量变化更鲁棒
timm 通用适配：与 mobilenetv3_small 系列共享相同代码
首次 NPU 推理：5.5M 参数，算子编译约 2-3 秒

1. 简介

本文档记录 mobilenetv3_large_100.ra_in1k MobileNetV3-Large 图像分类模型在昇腾 NPU（Ascend 910B3）上的迁移适配、精度评测与性能验证结果。

相关获取地址：

组件

版本

torch

2.8.0

torch_npu

2.8.0.post4

timm

1.0.27

CANN

8.5.1

conda create -n timm-models python=3.11 -y && conda activate timm-models pip install torch==2.8.0 torch_npu==2.8.0.post4 timm safetensors pillow numpy -i https://pypi.tuna.tsinghua.edu.cn/simple python inference.py --model_path ./mobilenetv3_large_100.ra_in1k --image photo.jpg --device npu

指标

数值

NPU 吞吐量

93.0 img/s

指标

数值

平均余弦相似度

0.999998

MAE

0.000001

精度误差率

0.0002%

Top-1 准确率

100.0%

6. 迁移适配说明

6.1 模型结构

Backbone：MobileNetV3-Large（width=1.0，约 5.5M 参数）

Head：GAP + Conv 1×1 + FC → 1000

输入：224×224 RGB

增强：RandAugment（RA）训练

6.2 适配代码

import timm, json; from safetensors.torch import load_file
with open("config.json") as f: cfg = json.load(f)
model = timm.create_model(cfg["architecture"], pretrained=False, num_classes=cfg["num_classes"])
model.load_state_dict(load_file("model.safetensors"), strict=False)
model.to("npu:0").eval()

7. 注意事项

Large vs Small：Large (5.5M) 精度显著高于 Small (2.5M)，适合对精度要求较高的移动端场景

RA 增强：RandAugment 训练的模型对图像质量变化更鲁棒

timm 通用适配：与 mobilenetv3_small 系列共享相同代码

首次 NPU 推理：5.5M 参数，算子编译约 2-3 秒