ResNet-18 NPU 适配模型

模型介绍

ResNet-18 是基于残差网络 (Residual Network) 架构的图像分类模型，由 Microsoft 发布。该模型在 ImageNet-1k 数据集（1000 类别）上训练，采用 18 层深度残差结构，是当时计算机视觉领域的重要里程碑，赢得了 2015 年 ILSVRC & COCO 竞赛冠军。

本仓库将 ResNet-18 适配到华为昇腾 (Ascend) NPU 上运行，支持 NPU 推理加速。

原始模型信息

模型名称: ResNet-18
原始模型地址: https://www.modelscope.cn/models/microsoft/resnet-18
任务类型: 图像分类 (Image Classification)
模型框架: PyTorch (HuggingFace Transformers)
模型架构: ResNetForImageClassification
输入格式: RGB 图像，尺寸 224×224，ImageNet 均值和标准差归一化
- Mean: [0.485, 0.456, 0.406]
- Std: [0.229, 0.224, 0.225]
输出格式: 1000 维 logits 向量（对应 ImageNet-1k 类别）
论文: Deep Residual Learning for Image Recognition

依赖环境

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt

依赖列表

包名	版本要求
torch	>=2.0.0
torch_npu	>=2.0.0
torchvision	>=0.15.0
transformers	>=4.30.0
Pillow	>=9.0.0
numpy	>=1.21.0

NPU 适配说明

将 PyTorch 模型迁移到华为昇腾 NPU 的过程：

从 ModelScope 下载原始 PyTorch 模型权重
使用 transformers 库加载 ResNetForImageClassification 模型
通过 .to(torch.device("npu:0")) 将模型迁移到 NPU 设备
输入数据同时迁移到 NPU：{k: v.to(device) for k, v in inputs.items()}
NPU 推理完成后，将输出迁回 CPU 进行后处理

关键适配点

NPU 设备选择：torch.device("npu:0")
输入数据需要显式移动到 NPU
输出结果需移回 CPU 进行后续处理
使用 torch_npu 包提供的 NPU 后端支持

推理脚本说明

文件	说明
`inference.py`	CPU 和 NPU 推理脚本，支持 `--device cpu/npu` 切换
`compare_cpu_npu.py`	CPU 与 NPU 精度对比脚本
`requirements.txt`	依赖包列表

推理命令

CPU 推理

python inference.py --device cpu \
  --model-dir /path/to/model \
  --output-dir /path/to/output

NPU 推理

python inference.py --device npu \
  --model-dir /path/to/model \
  --output-dir /path/to/output

CPU/NPU 精度对比

python compare_cpu_npu.py /path/to/output /path/to/output /path/to/output

CPU/NPU 推理结果

Top-5 预测结果对比

排名	CPU 预测类别	CPU 概率	NPU 预测类别	NPU 概率
1	jellyfish	0.2735	jellyfish	0.2734
2	hammerhead, hammerhead shark	0.0577	hammerhead, hammerhead shark	0.0577
3	bubble	0.0337	bubble	0.0337
4	electric ray, crampfish, numbfish, torpedo	0.0328	electric ray, crampfish, numbfish, torpedo	0.0328
5	sea snake	0.0262	sea snake	0.0262

Top-10 Logits 对比

排名	类别 ID	CPU Logit	NPU Logit	绝对差异
1	107	7.3036	7.3030	0.0006
2	4	5.7479	5.7471	0.0008
3	971	5.2091	5.2093	0.0002
4	5	5.1833	5.1823	0.0010
5	65	4.9574	4.9567	0.0007
6	6	4.9141	4.9133	0.0008
7	973	4.8954	4.8950	0.0004
8	611	4.8925	4.8922	0.0003
9	983	4.7669	4.7668	0.0001
10	329	4.3616	4.3612	0.0004

CPU/NPU 精度测试结果

精度指标

指标	数值
Top-1 预测一致性	✅ 一致 (类别 107: jellyfish)
Top-5 重叠率	5/5 (100%)
余弦相似度	1.000000
L1 误差 (Mean Absolute Error on Logits)	0.000402
L2 误差 (RMSE on Logits)	0.000503
最大绝对误差 (Max Absolute Error)	0.001886
相对误差 (Relative Error)	0.1368%
Top-5 平均概率差异	0.000000
每类最大误差均值	0.000402

性能对比

设备	推理耗时	加速比
CPU	0.0893s	1.00×
NPU (Ascend910)	0.0021s	42.52×

结论

NPU 与 CPU 推理结果误差 < 1%。

Top-1 预测完全一致
Top-5 预测完全一致，重叠率 100%
余弦相似度达到 1.000000
最大绝对误差仅 0.001886（远小于 1% 阈值）
NPU 推理速度是 CPU 的 42.52 倍

文件结构

resnet-18-npu/
├── inference.py          # 推理脚本（支持 CPU/NPU）
├── compare_cpu_npu.py    # CPU/NPU 精度对比脚本
├── requirements.txt      # 依赖列表
├── readme.md             # 本文件
├── model/                # 原始模型文件
└── output/               # 推理和对比结果
    ├── cpu_result.json   # CPU 推理结果
    ├── cpu_logits.npy    # CPU Logits
    ├── npu_result.json   # NPU 推理结果
    ├── npu_logits.npy    # NPU Logits
    └── comparison_result.json  # 精度对比结果

部署和推理方法

import torch
import torch_npu
from transformers import AutoImageProcessor, ResNetForImageClassification
from PIL import Image
import numpy as np

# 加载模型
model_dir = "./model/microsoft/resnet-18"
image_processor = AutoImageProcessor.from_pretrained(model_dir)
model = ResNetForImageClassification.from_pretrained(model_dir)

# 迁移到 NPU
device = torch.device("npu:0")
model = model.to(device)
model.eval()

# 准备输入
img = Image.open("your_image.jpg")
inputs = image_processor(img, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}

# 推理
with torch.no_grad():
    outputs = model(**inputs)

# 获取预测结果
logits = outputs.logits.cpu().numpy()
probs = torch.nn.functional.softmax(torch.tensor(logits), dim=-1).numpy()
top5_idx = np.argsort(logits[0])[-5:][::-1]

for rank, idx in enumerate(top5_idx):
    label = model.config.id2label[int(idx)]
    print(f"{rank+1}. {label} (prob: {probs[0][idx]:.4f})")

推理成功证据

本仓库提供完整的推理脚本，支持 CPU 和 NPU 双平台推理：

# NPU 推理
python3 inference.py --device npu

# CPU 推理
python3 inference.py --device cpu

推理完成后会输出推理结果和耗时，表明模型在 NPU 上推理成功。

ResNet-18 NPU 适配模型

模型介绍

本仓库将 ResNet-18 适配到华为昇腾 (Ascend) NPU 上运行，支持 NPU 推理加速。

原始模型信息

模型名称: ResNet-18
原始模型地址: https://www.modelscope.cn/models/microsoft/resnet-18
任务类型: 图像分类 (Image Classification)
模型框架: PyTorch (HuggingFace Transformers)
模型架构: ResNetForImageClassification
输入格式: RGB 图像，尺寸 224×224，ImageNet 均值和标准差归一化
- Mean: [0.485, 0.456, 0.406]
- Std: [0.229, 0.224, 0.225]
输出格式: 1000 维 logits 向量（对应 ImageNet-1k 类别）
论文: Deep Residual Learning for Image Recognition

依赖环境

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt

依赖列表

包名	版本要求
torch	>=2.0.0
torch_npu	>=2.0.0
torchvision	>=0.15.0
transformers	>=4.30.0
Pillow	>=9.0.0
numpy	>=1.21.0

NPU 适配说明

将 PyTorch 模型迁移到华为昇腾 NPU 的过程：

从 ModelScope 下载原始 PyTorch 模型权重
使用 transformers 库加载 ResNetForImageClassification 模型
通过 .to(torch.device("npu:0")) 将模型迁移到 NPU 设备
输入数据同时迁移到 NPU：{k: v.to(device) for k, v in inputs.items()}
NPU 推理完成后，将输出迁回 CPU 进行后处理

关键适配点

NPU 设备选择：torch.device("npu:0")
输入数据需要显式移动到 NPU
输出结果需移回 CPU 进行后续处理
使用 torch_npu 包提供的 NPU 后端支持

推理脚本说明

文件	说明
`inference.py`	CPU 和 NPU 推理脚本，支持 `--device cpu/npu` 切换
`compare_cpu_npu.py`	CPU 与 NPU 精度对比脚本
`requirements.txt`	依赖包列表

推理命令

CPU 推理

python inference.py --device cpu \
  --model-dir /path/to/model \
  --output-dir /path/to/output

NPU 推理

python inference.py --device npu \
  --model-dir /path/to/model \
  --output-dir /path/to/output

CPU/NPU 精度对比

python compare_cpu_npu.py /path/to/output /path/to/output /path/to/output

CPU/NPU 推理结果

Top-5 预测结果对比

排名	CPU 预测类别	CPU 概率	NPU 预测类别	NPU 概率
1	jellyfish	0.2735	jellyfish	0.2734
2	hammerhead, hammerhead shark	0.0577	hammerhead, hammerhead shark	0.0577
3	bubble	0.0337	bubble	0.0337
4	electric ray, crampfish, numbfish, torpedo	0.0328	electric ray, crampfish, numbfish, torpedo	0.0328
5	sea snake	0.0262	sea snake	0.0262

Top-10 Logits 对比

排名	类别 ID	CPU Logit	NPU Logit	绝对差异
1	107	7.3036	7.3030	0.0006
2	4	5.7479	5.7471	0.0008
3	971	5.2091	5.2093	0.0002
4	5	5.1833	5.1823	0.0010
5	65	4.9574	4.9567	0.0007
6	6	4.9141	4.9133	0.0008
7	973	4.8954	4.8950	0.0004
8	611	4.8925	4.8922	0.0003
9	983	4.7669	4.7668	0.0001
10	329	4.3616	4.3612	0.0004

CPU/NPU 精度测试结果

精度指标

指标	数值
Top-1 预测一致性	✅ 一致 (类别 107: jellyfish)
Top-5 重叠率	5/5 (100%)
余弦相似度	1.000000
L1 误差 (Mean Absolute Error on Logits)	0.000402
L2 误差 (RMSE on Logits)	0.000503
最大绝对误差 (Max Absolute Error)	0.001886
相对误差 (Relative Error)	0.1368%
Top-5 平均概率差异	0.000000
每类最大误差均值	0.000402

性能对比

设备	推理耗时	加速比
CPU	0.0893s	1.00×
NPU (Ascend910)	0.0021s	42.52×

结论

NPU 与 CPU 推理结果误差 < 1%。

Top-1 预测完全一致
Top-5 预测完全一致，重叠率 100%
余弦相似度达到 1.000000
最大绝对误差仅 0.001886（远小于 1% 阈值）
NPU 推理速度是 CPU 的 42.52 倍

文件结构

resnet-18-npu/
├── inference.py          # 推理脚本（支持 CPU/NPU）
├── compare_cpu_npu.py    # CPU/NPU 精度对比脚本
├── requirements.txt      # 依赖列表
├── readme.md             # 本文件
├── model/                # 原始模型文件
└── output/               # 推理和对比结果
    ├── cpu_result.json   # CPU 推理结果
    ├── cpu_logits.npy    # CPU Logits
    ├── npu_result.json   # NPU 推理结果
    ├── npu_logits.npy    # NPU Logits
    └── comparison_result.json  # 精度对比结果

部署和推理方法

import torch
import torch_npu
from transformers import AutoImageProcessor, ResNetForImageClassification
from PIL import Image
import numpy as np

# 加载模型
model_dir = "./model/microsoft/resnet-18"
image_processor = AutoImageProcessor.from_pretrained(model_dir)
model = ResNetForImageClassification.from_pretrained(model_dir)

# 迁移到 NPU
device = torch.device("npu:0")
model = model.to(device)
model.eval()

# 准备输入
img = Image.open("your_image.jpg")
inputs = image_processor(img, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}

# 推理
with torch.no_grad():
    outputs = model(**inputs)

# 获取预测结果
logits = outputs.logits.cpu().numpy()
probs = torch.nn.functional.softmax(torch.tensor(logits), dim=-1).numpy()
top5_idx = np.argsort(logits[0])[-5:][::-1]

for rank, idx in enumerate(top5_idx):
    label = model.config.id2label[int(idx)]
    print(f"{rank+1}. {label} (prob: {probs[0][idx]:.4f})")

推理成功证据

本仓库提供完整的推理脚本，支持 CPU 和 NPU 双平台推理：

# NPU 推理
python3 inference.py --device npu

# CPU 推理
python3 inference.py --device cpu

推理完成后会输出推理结果和耗时，表明模型在 NPU 上推理成功。

ResNet-18 NPU 适配模型

模型介绍

原始模型信息

依赖环境

依赖列表

NPU 适配说明

关键适配点

推理脚本说明

推理命令

CPU 推理

NPU 推理

CPU/NPU 精度对比

CPU/NPU 推理结果

Top-5 预测结果对比

Top-10 Logits 对比

CPU/NPU 精度测试结果

精度指标

性能对比

结论

标签

文件结构

部署和推理方法

推理成功证据

ResNet-18 NPU 适配模型

模型介绍

原始模型信息

依赖环境

依赖列表

NPU 适配说明

关键适配点

推理脚本说明

推理命令

CPU 推理

NPU 推理

CPU/NPU 精度对比

CPU/NPU 推理结果

Top-5 预测结果对比

Top-10 Logits 对比

CPU/NPU 精度测试结果

精度指标

性能对比

结论

标签

文件结构

部署和推理方法

推理成功证据