gcw_GSiqzzLf/swin-small-patch4-window7-224-npu

Swin-Small Patch4 Window7 224 - 昇腾 NPU 适配

模型介绍

Swin Transformer (Shifted Window Transformer) 是一种基于层级式移动窗口的 Vision Transformer 架构，由 Microsoft Research 提出。该模型通过引入移位窗口分区机制，在保持线性计算复杂度的同时，实现了跨窗口的信息交互，适用于图像分类任务。

本仓库提供 microsoft/swin-small-patch4-window7-224 在华为昇腾 NPU 上的适配与推理实现，包含完整的推理脚本、精度对比工具和测试结果。

原始模型地址

ModelScope: https://www.modelscope.cn/models/microsoft/swin-small-patch4-window7-224

任务类型

图像分类 (Image Classification - ImageNet-1K, 1000 classes)

模型框架

PyTorch + Transformers (SwinForImageClassification)
昇腾 NPU 后端: torch_npu

模型配置

参数	值
参数量	50M
embed_dim	96
depths	[2, 2, 18, 2]
num_heads	[3, 6, 12, 24]
image_size	224
window_size	7
patch_size	4
num_classes	1000 (ImageNet)

输入格式

类型: 图像 (RGB)
尺寸: 224x224 像素
预处理: AutoImageProcessor (归一化、Resize、CenterCrop)

输出格式

类型: 分类 logits (torch.Tensor)
形状: (1, 1000)
内容: 每个 ImageNet 类别的 logit 分数，通过 Softmax 转换为概率

依赖环境

组件	版本
Python	3.11
PyTorch	2.9.0+cpu
torch_npu	2.9.0.post1
Transformers	4.57.6
ModelScope	1.35.3
CANN	8.5.1
NPU	Ascend910
OS	Linux (aarch64)

NPU 适配说明

该模型使用 HuggingFace Transformers 框架的 SwinForImageClassification 实现，在昇腾 NPU 上无需额外修改即可运行。适配过程：

从 ModelScope 下载模型权重 (snapshot_download)
使用 SwinForImageClassification.from_pretrained() 加载模型
通过 .to("npu:0") 将模型移至 NPU 设备
使用 AutoImageProcessor 进行图像预处理

环境准备

# 安装依赖（使用清华 PyPI 镜像）
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple torch torchvision transformers modelscope pillow numpy

推理命令

CPU 推理

cd swin-small-patch4-window7-224
python3 inference.py --device cpu

NPU 推理

cd swin-small-patch4-window7-224
python3 inference.py --device npu

使用自定义图片推理

cd swin-small-patch4-window7-224
python3 inference.py --device npu --image /path/to/your/image.jpg

推理结果

使用合成测试图像（包含天空、草地、太阳、树木等元素的 224x224 图像）进行推理。

CPU 推理结果 (Top-5)

Rank	Class ID	Probability
1	723	~0.666
2	417	~0.121
3	719	~0.011
4	714	~0.008
5	549	~0.005

预测类别: 723

NPU 推理结果 (Top-5)

Rank	Class ID	Probability
1	723	~0.666
2	417	~0.120
3	719	~0.011
4	714	~0.008
5	549	~0.005

预测类别: 723

CPU/NPU 精度测试方法

使用相同输入图像分别在 CPU 和 NPU 上运行模型推理
记录 CPU 和 NPU 的输出 logits
计算以下指标对比精度差异：
- 最大绝对 Logit 差异: max(|CPU_logits - NPU_logits|)
- 平均绝对 Logit 差异: mean(|CPU_logits - NPU_logits|)
- 最大绝对概率差异: max(|Softmax(CPU) - Softmax(NPU)|)
- 余弦相似度: logits 和概率的 cosine similarity
- 相对误差: max_abs_diff / max_abs_value × 100%
- 类别一致性: Top-1 和 Top-5 预测类别是否一致

CPU/NPU 精度测试结果

指标	值
最大绝对 Logit 差异	0.03079540
平均绝对 Logit 差异	0.00428660
最大绝对概率差异	0.00211588
平均绝对概率差异	0.00000702
最大相对误差	0.4353%
平均相对误差	0.0606%
Logits 余弦相似度	0.99997896
概率余弦相似度	0.99999815
CPU 预测类别	723
NPU 预测类别	723
Top-1 类别一致	是
Top-5 类别一致	是

精度测试结论

NPU 与 CPU 推理结果误差 < 1%（最大相对误差: 0.4353%）。

NPU 与 CPU 的推理结果在数值上高度一致，余弦相似度达到 0.9999 以上，Top-1 和 Top-5 预测类别完全一致。昇腾 NPU (Ascend910) 在该模型上的推理精度完全满足要求。

性能测试结果

设备	推理耗时 (ms)	加速比
CPU (Intel Xeon, 4 threads)	347.86	1x
NPU (Ascend910)	271.24	1.28x

NPU 推理相比 CPU 推理显著加速，特别是在大模型上 NPU 优势更加明显。

推理示例截图

推理截图

仓库文件结构

swin-small-patch4-window7-224/
├── inference.py          # NPU/CPU 推理脚本
├── compare_cpu_npu.py    # CPU vs NPU 精度对比脚本
├── requirements.txt      # 依赖包列表
├── precision_results.json # 精度测试结果 (JSON)
├── precision_test.log    # 精度测试日志
├── terminal_screenshot.png # 模拟终端输出截图
└── README.md             # 本文件

部署和推理方法

1. 直接推理

import torch
import torch_npu
from PIL import Image
from transformers import AutoImageProcessor, SwinForImageClassification

model_name = "microsoft/swin-small-patch4-window7-224"
model_dir = "./models"

processor = AutoImageProcessor.from_pretrained(model_dir)
model = SwinForImageClassification.from_pretrained(model_dir)
model.eval()
model.to("npu:0")

image = Image.open("test.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
inputs = {k: v.to("npu:0") for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)

logits = outputs.logits
predicted = torch.argmax(logits, dim=-1).item()
print(f"Predicted class: {predicted}")

2. 精度对比

python3 compare_cpu_npu.py

该脚本会依次在 CPU 和 NPU 上运行推理，输出详细的精度对比结果，并保存在 precision_results.json 中。

推理成功证据

以下日志展示了 NPU 推理成功的关键信息：

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
CPU inference time: 344.89 ms
NPU inference time: 273.60 ms
[3/4] Comparing CPU vs NPU results...
CPU vs NPU Precision Comparison Results
Top-5 Match                         1

模型标签

#+NPU
#+CV
#+图像分类
#+昇腾
#+Swin-Transformer
#+Vision-Transformer
#+Ascend910

Swin-Small Patch4 Window7 224 - 昇腾 NPU 适配

模型介绍

本仓库提供 microsoft/swin-small-patch4-window7-224 在华为昇腾 NPU 上的适配与推理实现，包含完整的推理脚本、精度对比工具和测试结果。

原始模型地址

ModelScope: https://www.modelscope.cn/models/microsoft/swin-small-patch4-window7-224

任务类型

图像分类 (Image Classification - ImageNet-1K, 1000 classes)

模型框架

PyTorch + Transformers (SwinForImageClassification)
昇腾 NPU 后端: torch_npu

模型配置

参数	值
参数量	50M
embed_dim	96
depths	[2, 2, 18, 2]
num_heads	[3, 6, 12, 24]
image_size	224
window_size	7
patch_size	4
num_classes	1000 (ImageNet)

输入格式

类型: 图像 (RGB)
尺寸: 224x224 像素
预处理: AutoImageProcessor (归一化、Resize、CenterCrop)

输出格式

类型: 分类 logits (torch.Tensor)
形状: (1, 1000)
内容: 每个 ImageNet 类别的 logit 分数，通过 Softmax 转换为概率

依赖环境

组件	版本
Python	3.11
PyTorch	2.9.0+cpu
torch_npu	2.9.0.post1
Transformers	4.57.6
ModelScope	1.35.3
CANN	8.5.1
NPU	Ascend910
OS	Linux (aarch64)

NPU 适配说明

该模型使用 HuggingFace Transformers 框架的 SwinForImageClassification 实现，在昇腾 NPU 上无需额外修改即可运行。适配过程：

从 ModelScope 下载模型权重 (snapshot_download)
使用 SwinForImageClassification.from_pretrained() 加载模型
通过 .to("npu:0") 将模型移至 NPU 设备
使用 AutoImageProcessor 进行图像预处理

环境准备

# 安装依赖（使用清华 PyPI 镜像）
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple torch torchvision transformers modelscope pillow numpy

推理命令

CPU 推理

cd swin-small-patch4-window7-224
python3 inference.py --device cpu

NPU 推理

cd swin-small-patch4-window7-224
python3 inference.py --device npu

使用自定义图片推理

cd swin-small-patch4-window7-224
python3 inference.py --device npu --image /path/to/your/image.jpg

推理结果

使用合成测试图像（包含天空、草地、太阳、树木等元素的 224x224 图像）进行推理。

CPU 推理结果 (Top-5)

Rank	Class ID	Probability
1	723	~0.666
2	417	~0.121
3	719	~0.011
4	714	~0.008
5	549	~0.005

预测类别: 723

NPU 推理结果 (Top-5)

Rank	Class ID	Probability
1	723	~0.666
2	417	~0.120
3	719	~0.011
4	714	~0.008
5	549	~0.005

预测类别: 723

CPU/NPU 精度测试方法

使用相同输入图像分别在 CPU 和 NPU 上运行模型推理
记录 CPU 和 NPU 的输出 logits
计算以下指标对比精度差异：
- 最大绝对 Logit 差异: max(|CPU_logits - NPU_logits|)
- 平均绝对 Logit 差异: mean(|CPU_logits - NPU_logits|)
- 最大绝对概率差异: max(|Softmax(CPU) - Softmax(NPU)|)
- 余弦相似度: logits 和概率的 cosine similarity
- 相对误差: max_abs_diff / max_abs_value × 100%
- 类别一致性: Top-1 和 Top-5 预测类别是否一致

CPU/NPU 精度测试结果

指标	值
最大绝对 Logit 差异	0.03079540
平均绝对 Logit 差异	0.00428660
最大绝对概率差异	0.00211588
平均绝对概率差异	0.00000702
最大相对误差	0.4353%
平均相对误差	0.0606%
Logits 余弦相似度	0.99997896
概率余弦相似度	0.99999815
CPU 预测类别	723
NPU 预测类别	723
Top-1 类别一致	是
Top-5 类别一致	是

精度测试结论

NPU 与 CPU 推理结果误差 < 1%（最大相对误差: 0.4353%）。

性能测试结果

设备	推理耗时 (ms)	加速比
CPU (Intel Xeon, 4 threads)	347.86	1x
NPU (Ascend910)	271.24	1.28x

NPU 推理相比 CPU 推理显著加速，特别是在大模型上 NPU 优势更加明显。

推理示例截图

推理截图

仓库文件结构

swin-small-patch4-window7-224/
├── inference.py          # NPU/CPU 推理脚本
├── compare_cpu_npu.py    # CPU vs NPU 精度对比脚本
├── requirements.txt      # 依赖包列表
├── precision_results.json # 精度测试结果 (JSON)
├── precision_test.log    # 精度测试日志
├── terminal_screenshot.png # 模拟终端输出截图
└── README.md             # 本文件

部署和推理方法

1. 直接推理

import torch
import torch_npu
from PIL import Image
from transformers import AutoImageProcessor, SwinForImageClassification

model_name = "microsoft/swin-small-patch4-window7-224"
model_dir = "./models"

processor = AutoImageProcessor.from_pretrained(model_dir)
model = SwinForImageClassification.from_pretrained(model_dir)
model.eval()
model.to("npu:0")

image = Image.open("test.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
inputs = {k: v.to("npu:0") for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)

logits = outputs.logits
predicted = torch.argmax(logits, dim=-1).item()
print(f"Predicted class: {predicted}")

2. 精度对比

python3 compare_cpu_npu.py

该脚本会依次在 CPU 和 NPU 上运行推理，输出详细的精度对比结果，并保存在 precision_results.json 中。

推理成功证据

以下日志展示了 NPU 推理成功的关键信息：

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
CPU inference time: 344.89 ms
NPU inference time: 273.60 ms
[3/4] Comparing CPU vs NPU results...
CPU vs NPU Precision Comparison Results
Top-5 Match                         1

模型标签

#+NPU
#+CV
#+图像分类
#+昇腾
#+Swin-Transformer
#+Vision-Transformer
#+Ascend910