cv_nafnet_image-deblur_reds - 昇腾 NPU 适配

1. 模型简介

NAFNet（无非线性激活函数网络）图像去模糊模型，在 REDS 数据集上训练。NAFNet 通过移除非线性激活函数简化了网络结构，同时保持了优秀的去模糊性能。

原始模型: damo/cv_nafnet_image-deblur_reds
框架: PyTorch
任务: 图像去模糊 (Image Deblurring)

2. 昇腾 NPU 适配结果

指标	值
余弦相似度	1.000000
最大绝对误差	3.401489
相对误差	0.1827%
平均延迟	28.16 ms
峰值显存	0.4361 GB
参数量	67,888,835
推理精度	float32
设备	Ascend 910B4

3. 环境要求

组件	版本
CANN	8.5.1
torch_npu	2.9.0.post1
PyTorch	2.9.0
Python	3.11

4. 快速使用

# 设置环境
source setup_env.sh

# 运行推理 (CPU vs NPU 对比)
python3 inference.py --device npu:0

5. 推理输出证据

NPU 推理输出（float32）:

Model: cv_nafnet_image-deblur_reds (NAFNet)
Device: npu:0
Dtype: float32
Input shape: [1, 3, 256, 256]
------------------------------------------------------------

[CPU] Loading model...
[CPU] Parameters: 67,888,835
[CPU] Running inference...
[CPU] Output shape: [1, 3, 256, 256]

[NPU] Moving model to npu:0...
[NPU] Running inference...
[NPU] Output shape: [1, 3, 256, 256]

  Cosine Similarity: 1.000000
  MaxAbsErr:         3.401489
  Relative Error:    0.1827%

  Avg latency:  28.16 ms
  Peak HBM: 0.4361 GB

PASS: CPU vs NPU outputs match (cosine >= 0.99)

6. CPU 与 NPU 精度对比

指标	CPU (float32)	NPU (float32)	误差
余弦相似度	基准	1.000000	< 0.001%
最大绝对误差	-	3.401489	-
相对误差	-	0.1827%	< 1% ✓
输出维度	[1, 3, 256, 256]	[1, 3, 256, 256]	一致
非数字值	False	False	一致

7. 模型结构

架构: NAFNet（无非线性激活网络）
输入: 模糊图像 [1, 3, 256, 256]
输出: 去模糊图像 [1, 3, 256, 256]
训练数据: REDS 数据集

8. 验证报告

详见 screenshots/verification.txt。

9. 智能体技能

本适配由 Ascend NPU 适配 Agent Skill 自动完成。

1. 模型简介

原始模型: damo/cv_nafnet_image-deblur_reds

框架: PyTorch

任务: 图像去模糊 (Image Deblurring)

指标

值

余弦相似度

1.000000

最大绝对误差

3.401489

相对误差

0.1827%

平均延迟

28.16 ms

峰值显存

0.4361 GB

参数量

67,888,835

推理精度

float32

设备

Ascend 910B4

组件

版本

CANN

8.5.1

torch_npu

2.9.0.post1

PyTorch

2.9.0

Python

3.11

5. 推理输出证据

NPU 推理输出（float32）:

Model: cv_nafnet_image-deblur_reds (NAFNet)
Device: npu:0
Dtype: float32
Input shape: [1, 3, 256, 256]
------------------------------------------------------------

[CPU] Loading model...
[CPU] Parameters: 67,888,835
[CPU] Running inference...
[CPU] Output shape: [1, 3, 256, 256]

[NPU] Moving model to npu:0...
[NPU] Running inference...
[NPU] Output shape: [1, 3, 256, 256]

  Cosine Similarity: 1.000000
  MaxAbsErr:         3.401489
  Relative Error:    0.1827%

  Avg latency:  28.16 ms
  Peak HBM: 0.4361 GB

PASS: CPU vs NPU outputs match (cosine >= 0.99)

指标

CPU (float32)

NPU (float32)

误差

余弦相似度

基准

1.000000

< 0.001%

最大绝对误差

3.401489

相对误差

0.1827%

< 1% ✓

输出维度

[1, 3, 256, 256]

一致

非数字值

False

一致