cv_nafnet_image-deblur_gopro - 昇腾 NPU 适配

1. 模型简介

NAFNet（Nonlinear Activation Free Network，无非线性激活网络）图像去模糊模型，基于 GoPro 数据集训练。该模型通过移除网络结构中的非线性激活函数实现简化，同时仍保持出色的去模糊性能。

原始模型: damo/cv_nafnet_image-deblur_gopro
框架: PyTorch
任务: 图像去模糊 (Image Deblurring)

2. 昇腾 NPU 适配结果

指标	值
余弦相似度	0.999958
最大绝对误差	1.698051
相对误差	0.8650%
平均延迟	28.17 ms
峰值显存	0.4361 GB
参数量	67,888,835
推理精度	float32
设备	Ascend 910B4

3. 环境要求

组件	版本
CANN	8.5.1
torch_npu	2.9.0.post1
PyTorch	2.9.0
Python	3.11

4. 快速使用

# 设置环境
source setup_env.sh

# 运行推理 (CPU vs NPU 对比)
python3 inference.py --device npu:0

5. 推理输出证据

NPU 推理输出（float32）:

Model: cv_nafnet_image-deblur_gopro (NAFNet)
Device: npu:0
Dtype: float32
Input shape: [1, 3, 256, 256]
------------------------------------------------------------

[CPU] Loading model...
[CPU] Parameters: 67,888,835
[CPU] Running inference...
[CPU] Output shape: [1, 3, 256, 256]

[NPU] Moving model to npu:0...
[NPU] Running inference...
[NPU] Output shape: [1, 3, 256, 256]

  Cosine Similarity: 0.999958
  MaxAbsErr:         1.698051
  Relative Error:    0.8650%

  Avg latency:  28.17 ms
  Peak HBM: 0.4361 GB

PASS: CPU vs NPU outputs match (cosine >= 0.99)

6. CPU 与 NPU 精度对比

指标	CPU (float32)	NPU (float32)	误差
余弦相似度	基准	0.999958	< 0.01%
最大绝对误差	-	1.698051	-
相对误差	-	0.8650%	< 1% ✓
输出维度	[1, 3, 256, 256]	[1, 3, 256, 256]	一致
非数值	False	False	一致

7. 模型结构

架构: NAFNet (Nonlinear Activation Free Network)
输入: 模糊图像 [1, 3, 256, 256]
输出: 去模糊图像 [1, 3, 256, 256]
训练数据: GoPro 数据集

8. 验证报告

详见 screenshots/verification.txt。

9. Agent Skill

本适配由 Ascend NPU 适配 Agent Skill 自动完成。

1. 模型简介

原始模型: damo/cv_nafnet_image-deblur_gopro

框架: PyTorch

任务: 图像去模糊 (Image Deblurring)

指标

值

余弦相似度

0.999958

最大绝对误差

1.698051

相对误差

0.8650%

平均延迟

28.17 ms

峰值显存

0.4361 GB

参数量

67,888,835

推理精度

float32

设备

Ascend 910B4

组件

版本

CANN

8.5.1

torch_npu

2.9.0.post1

PyTorch

2.9.0

Python

3.11

5. 推理输出证据

NPU 推理输出（float32）:

Model: cv_nafnet_image-deblur_gopro (NAFNet)
Device: npu:0
Dtype: float32
Input shape: [1, 3, 256, 256]
------------------------------------------------------------

[CPU] Loading model...
[CPU] Parameters: 67,888,835
[CPU] Running inference...
[CPU] Output shape: [1, 3, 256, 256]

[NPU] Moving model to npu:0...
[NPU] Running inference...
[NPU] Output shape: [1, 3, 256, 256]

  Cosine Similarity: 0.999958
  MaxAbsErr:         1.698051
  Relative Error:    0.8650%

  Avg latency:  28.17 ms
  Peak HBM: 0.4361 GB

PASS: CPU vs NPU outputs match (cosine >= 0.99)

指标

CPU (float32)

NPU (float32)

误差

余弦相似度

基准

0.999958

< 0.01%

最大绝对误差

1.698051

相对误差

0.8650%

< 1% ✓

输出维度

[1, 3, 256, 256]

一致

非数值

False

一致