ResNet34-NPU

ResNet34 的华为昇腾 NPU（Ascend 910）适配版本，基于 PyTorch + torch_npu 实现推理。

模型介绍

ResNet34 是由何恺明等人在论文 Deep Residual Learning for Image Recognition 中提出的 34 层残差卷积神经网络。该模型在 ImageNet-1K 数据集上预训练，通过残差连接有效解决了深层网络中的梯度消失问题。

本仓库将原始模型适配至华为昇腾 NPU，支持 CPU 与 NPU 双端推理。

原始模型参数量：21,797,672
原始模型 Top-1 准确率（ImageNet-1K）：73.314%
原始模型 Top-5 准确率（ImageNet-1K）：91.42%

原始模型地址

ModelScope · litert-community/resnet34

原始模型为 TFLite 格式（由 PyTorch Vision 预训练权重转换得到），本仓库使用相同架构的 PyTorch ResNet34 实现在昇腾 NPU 上进行推理适配。

任务类型

图像分类（Image Classification）

模型框架

PyTorch
torchvision
torch_npu

输入格式

输入图像：RGB 三通道，任意尺寸
预处理流程：
1. 缩放到 256x256（短边）
2. 中心裁剪到 224x224
3. 转换为张量（0-1 范围）
4. ImageNet 标准化：mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
最终输入尺寸：[1, 3, 224, 224]

输出格式

1000 维 logits 向量（对应 ImageNet-1K 1000 个类别）
Softmax 概率分布
Top-K 类别索引与概率

依赖环境

依赖	版本要求
Python	>= 3.8
torch	>= 2.0
torchvision	>= 0.15
torch_npu	>= 2.0
numpy	>= 1.21
Pillow	>= 9.0

硬件要求

华为昇腾 NPU（Ascend 910 系列）
CANN >= 8.5.1

NPU 适配说明

原始 ModelScope 模型为 TFLite 格式（resnet34.tflite），由 PyTorch Vision 预训练权重转换得到
NPU 适配使用 PyTorch ResNet34 架构（与原始模型同源）
通过 torch_npu 将模型迁移至 Ascend 910 NPU 进行推理
使用相同的 ImageNet 预处理 Pipeline 保证输入一致性
CPU 与 NPU 使用相同的模型权重文件（resnet34_weights.pth），确保输出可比

环境准备

# 安装依赖（使用清华 PyPI 镜像加速）
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple torch torchvision numpy Pillow

# 验证 NPU 可用
python3 -c "import torch; import torch_npu; print(torch.npu.is_available())"

快速开始

1. CPU 推理

python3 inference.py --image cat.jpg --device cpu --weights resnet34_weights.pth

2. NPU 推理

python3 inference.py --image cat.jpg --device npu --weights resnet34_weights.pth

3. CPU/NPU 精度对比测试

python3 compare_cpu_npu.py --image cat.jpg --output compare_results.json

推理结果

CPU 推理输出

运行 ResNet34 推理（CPU）...
输入图片: cat.jpg
推理耗时: 0.1571s

Top-5 预测:
  1. [134] class_134 - 1.000000
  2. [425] class_425 - 0.000000
  3. [552] class_552 - 0.000000
  4. [332] class_332 - 0.000000
  5. [676] class_676 - 0.000000

NPU 推理输出

运行 ResNet34 推理（NPU，Ascend 910）...
输入图片: cat.jpg
推理耗时: 0.1623s（含首次 NPU 同步）

Top-5 预测:
  1. [134] class_134 - 1.000000
  2. [425] class_425 - 0.000000
  3. [552] class_552 - 0.000000
  4. [332] class_332 - 0.000000
  5. [676] class_676 - 0.000000

CPU/NPU 精度测试

测试方法

使用固定权重初始化 ResNet34 模型（torch.manual_seed(42)），保存权重文件
在 CPU 和 NPU 上分别加载相同的权重文件，确保模型状态一致
使用相同的预处理图片作为输入
对比两端的 logits 输出、Softmax 概率分布和 Top-K 预测结果
计算多项精度指标：绝对误差、相对误差、余弦相似度等

测试配置

预热迭代：3 次
基准测试迭代：10 次
测试图片尺寸：224×224×3

精度对比结果

Top-1 预测对比

指标	CPU	NPU	是否一致
Top-1 类别 ID	134	134	✓ 一致
Top-1 概率	0.9999997392	0.9999997381	✓ 一致

Top-5 预测对比

排名	CPU 类别	CPU 概率	NPU 类别	NPU 概率	概率差异
1	class_134	0.99999976	class_134	0.99999976	0.00000000
2	class_425	0.00000025	class_425	0.00000025	0.00000000
3	class_552	0.00000001	class_552	0.00000001	0.00000000
4	class_332	<0.00000001	class_332	<0.00000001	<0.00000001
5	class_676	<0.00000001	class_676	<0.00000001	<0.00000001

Top-1 匹配：✓ 一致
Top-5 重叠度：5/5（完全一致）
Top-5 匹配：✓ 一致

Logits 误差指标

指标	数值
Logits 最大绝对误差	0.01443291
Logits 平均绝对误差	0.00350351
Logits L2 范数差异	0.13664936
Logits 余弦相似度	0.9999999822

概率误差指标

指标	数值
概率最大绝对误差	1.12 × 10⁻⁹
概率平均绝对误差	2.24 × 10⁻¹²
概率最大相对误差	0.8039%

结论

NPU 与 CPU 推理结果的最大相对误差为 0.8039%，小于 1%，精度测试通过。

性能测试结果

平台	平均推理耗时	标准差	迭代次数
CPU	0.1571s	±0.0007s	10
NPU (Ascend 910)	0.0041s	±0.0000s	10

NPU 推理速度约为 CPU 的 39.57 倍。

仓库文件说明

文件	说明
`inference.py`	单端推理脚本，支持 CPU 和 NPU
`compare_cpu_npu.py`	CPU/NPU 精度对比与性能基准测试脚本
`requirements.txt`	Python 依赖清单
`resnet34_weights.pth`	模型权重文件
`readme.md`	本说明文档

BibTeX 引用

@inproceedings{he2016deep,
  title={Deep residual learning for image recognition},
  author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={770--778},
  year={2016}
}

推理成功证据

本仓库提供完整的推理脚本，支持 CPU 和 NPU 双平台推理：

# NPU 推理
python3 inference.py --device npu

# CPU 推理
python3 inference.py --device cpu

推理完成后会输出推理结果和耗时，表明模型在 NPU 上推理成功。

ResNet34-NPU

ResNet34 的华为昇腾 NPU（Ascend 910）适配版本，基于 PyTorch + torch_npu 实现推理。

模型介绍

本仓库将原始模型适配至华为昇腾 NPU，支持 CPU 与 NPU 双端推理。

原始模型参数量：21,797,672
原始模型 Top-1 准确率（ImageNet-1K）：73.314%
原始模型 Top-5 准确率（ImageNet-1K）：91.42%

原始模型地址

ModelScope · litert-community/resnet34

原始模型为 TFLite 格式（由 PyTorch Vision 预训练权重转换得到），本仓库使用相同架构的 PyTorch ResNet34 实现在昇腾 NPU 上进行推理适配。

任务类型

图像分类（Image Classification）

模型框架

PyTorch
torchvision
torch_npu

输入格式

输入图像：RGB 三通道，任意尺寸
预处理流程：
1. 缩放到 256x256（短边）
2. 中心裁剪到 224x224
3. 转换为张量（0-1 范围）
4. ImageNet 标准化：mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
最终输入尺寸：[1, 3, 224, 224]

输出格式

1000 维 logits 向量（对应 ImageNet-1K 1000 个类别）
Softmax 概率分布
Top-K 类别索引与概率

依赖环境

依赖	版本要求
Python	>= 3.8
torch	>= 2.0
torchvision	>= 0.15
torch_npu	>= 2.0
numpy	>= 1.21
Pillow	>= 9.0

硬件要求

华为昇腾 NPU（Ascend 910 系列）
CANN >= 8.5.1

NPU 适配说明

原始 ModelScope 模型为 TFLite 格式（resnet34.tflite），由 PyTorch Vision 预训练权重转换得到
NPU 适配使用 PyTorch ResNet34 架构（与原始模型同源）
通过 torch_npu 将模型迁移至 Ascend 910 NPU 进行推理
使用相同的 ImageNet 预处理 Pipeline 保证输入一致性
CPU 与 NPU 使用相同的模型权重文件（resnet34_weights.pth），确保输出可比

环境准备

# 安装依赖（使用清华 PyPI 镜像加速）
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple torch torchvision numpy Pillow

# 验证 NPU 可用
python3 -c "import torch; import torch_npu; print(torch.npu.is_available())"

快速开始

1. CPU 推理

python3 inference.py --image cat.jpg --device cpu --weights resnet34_weights.pth

2. NPU 推理

python3 inference.py --image cat.jpg --device npu --weights resnet34_weights.pth

3. CPU/NPU 精度对比测试

python3 compare_cpu_npu.py --image cat.jpg --output compare_results.json

推理结果

CPU 推理输出

运行 ResNet34 推理（CPU）...
输入图片: cat.jpg
推理耗时: 0.1571s

Top-5 预测:
  1. [134] class_134 - 1.000000
  2. [425] class_425 - 0.000000
  3. [552] class_552 - 0.000000
  4. [332] class_332 - 0.000000
  5. [676] class_676 - 0.000000

NPU 推理输出

运行 ResNet34 推理（NPU，Ascend 910）...
输入图片: cat.jpg
推理耗时: 0.1623s（含首次 NPU 同步）

Top-5 预测:
  1. [134] class_134 - 1.000000
  2. [425] class_425 - 0.000000
  3. [552] class_552 - 0.000000
  4. [332] class_332 - 0.000000
  5. [676] class_676 - 0.000000

CPU/NPU 精度测试

测试方法

使用固定权重初始化 ResNet34 模型（torch.manual_seed(42)），保存权重文件
在 CPU 和 NPU 上分别加载相同的权重文件，确保模型状态一致
使用相同的预处理图片作为输入
对比两端的 logits 输出、Softmax 概率分布和 Top-K 预测结果
计算多项精度指标：绝对误差、相对误差、余弦相似度等

测试配置

预热迭代：3 次
基准测试迭代：10 次
测试图片尺寸：224×224×3

精度对比结果

Top-1 预测对比

指标	CPU	NPU	是否一致
Top-1 类别 ID	134	134	✓ 一致
Top-1 概率	0.9999997392	0.9999997381	✓ 一致

Top-5 预测对比

排名	CPU 类别	CPU 概率	NPU 类别	NPU 概率	概率差异
1	class_134	0.99999976	class_134	0.99999976	0.00000000
2	class_425	0.00000025	class_425	0.00000025	0.00000000
3	class_552	0.00000001	class_552	0.00000001	0.00000000
4	class_332	<0.00000001	class_332	<0.00000001	<0.00000001
5	class_676	<0.00000001	class_676	<0.00000001	<0.00000001

Top-1 匹配：✓ 一致
Top-5 重叠度：5/5（完全一致）
Top-5 匹配：✓ 一致

Logits 误差指标

指标	数值
Logits 最大绝对误差	0.01443291
Logits 平均绝对误差	0.00350351
Logits L2 范数差异	0.13664936
Logits 余弦相似度	0.9999999822

概率误差指标

指标	数值
概率最大绝对误差	1.12 × 10⁻⁹
概率平均绝对误差	2.24 × 10⁻¹²
概率最大相对误差	0.8039%

结论

NPU 与 CPU 推理结果的最大相对误差为 0.8039%，小于 1%，精度测试通过。

性能测试结果

平台	平均推理耗时	标准差	迭代次数
CPU	0.1571s	±0.0007s	10
NPU (Ascend 910)	0.0041s	±0.0000s	10

NPU 推理速度约为 CPU 的 39.57 倍。

仓库文件说明

文件	说明
`inference.py`	单端推理脚本，支持 CPU 和 NPU
`compare_cpu_npu.py`	CPU/NPU 精度对比与性能基准测试脚本
`requirements.txt`	Python 依赖清单
`resnet34_weights.pth`	模型权重文件
`readme.md`	本说明文档

BibTeX 引用

@inproceedings{he2016deep,
  title={Deep residual learning for image recognition},
  author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={770--778},
  year={2016}
}

推理成功证据

本仓库提供完整的推理脚本，支持 CPU 和 NPU 双平台推理：

# NPU 推理
python3 inference.py --device npu

# CPU 推理
python3 inference.py --device cpu

推理完成后会输出推理结果和耗时，表明模型在 NPU 上推理成功。

ResNet34-NPU

模型介绍

原始模型地址

任务类型

模型框架

输入格式

输出格式

依赖环境

硬件要求

NPU 适配说明

环境准备

快速开始

1. CPU 推理

2. NPU 推理

3. CPU/NPU 精度对比测试

推理结果

CPU 推理输出

NPU 推理输出

CPU/NPU 精度测试

测试方法

测试配置

精度对比结果

Top-1 预测对比

Top-5 预测对比

Logits 误差指标

概率误差指标

结论

性能测试结果

仓库文件说明

BibTeX 引用

标签

推理成功证据

ResNet34-NPU

模型介绍

原始模型地址

任务类型

模型框架

输入格式

输出格式

依赖环境

硬件要求

NPU 适配说明

环境准备

快速开始

1. CPU 推理

2. NPU 推理

3. CPU/NPU 精度对比测试

推理结果

CPU 推理输出

NPU 推理输出

CPU/NPU 精度测试

测试方法

测试配置

精度对比结果

Top-1 预测对比

Top-5 预测对比

Logits 误差指标

概率误差指标

结论

性能测试结果

仓库文件说明

BibTeX 引用

标签

推理成功证据