F3NET模型针对现有显著目标检测模型因未考虑卷积神经网络不同层特征差异导致融合效果次优的问题,提出交叉特征模块(CFM)选择性聚合多尺度特征、级联反馈解码器(CFD)通过多阶段反馈消除特征差异。
| 内容 | 版本 |
|---|---|
| 固件与驱动 | 25.3.rc1 |
| CANN | 8.2.rc1 |
| Python | 3.11.9 |
| Pytorch | 2.7.1 |
| torch_npu | 2.7.1 |
| opencv-python | 4.12.0.88 |
| SQLAlchemy | 1.45.4 |
| wtforms | 2.3.3 |
| setuptools | 59.8.0 |
| pyramid | mindie |
| 部署方式 | mindie镜像或裸机部署 |
### (方式一)从github下载并解压F3NET模型的源码,参考下述修改test.py/net.py/train.py文件,完成适配
git clone https://github.com/weijun-arc/F3Net.git### (方式二)从gitcode下载并解压已适配后的F3NET模型的代码,即F3Net.tar压缩包
git clone https://atomgit.com/Ascend-SACT/F3NET.git将Resnet模型参数保存到./res目录。 将Model-32参数保存到./src/out目录。
F3NET论文提供五种数据集,分别为PASCAL-S、ECSSD、HKU-IS、DUT-OMRON、DUTS-TE。将数据集的image和mask分别保存在./data/数据集名称/image目录和./data/数据集名称/mask目录。
vim docker_start.sh
# 脚本内容如下
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
CONTAINER_NAME=容器名称
IMAGE=镜像ID
docker run -itd --privileged --name=$CONTAINER_NAME --ipc=host \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-v /usr/local/sbin:/usr/local/sbin \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /tmp:/tmp \
-v /mnt:/mnt \
-v /usr/share/zoneinfo/Asia/Shanghai:/etc/localtime \
-v /home:/home \
-v /data:/data \
-w /home \
$IMAGE \
/bin/bash
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# 进入容器
bash docker_start.sh
docker exec -it <容器名称> bash修改./src/test.py文件
### 添加torch_npu和torchair
import torch_npu
import torchair
from torchair.configs.compiler_config import CompilerConfig### 修改设备选择和添加torchair配置
class Test(object):
def __init__(self, Dataset, Network, path):
...
# self.net.cuda()
self.net.npu()
config_torchair = CompilerConfig()
npu_backend = torchair.get_npu_backend(compiler_config=config_torchair)
self.net = torch.compile(self.net, backend=npu_backend)
def show(self):
with torch.no_grad():
for image, mask, shape, name in self.loader:
# image, mask = image.cuda().float(), mask.cuda().float()
image, mask = image.npu().float(), mask.npu().float()
def save(self):
with torch.no_grad():
for image, mask, shape, name in self.loader:
# image = image.cuda().float()
image = image.npu().float()修改 ./src/net.py 文件
###由于pytorch版本>2.6,需要进行适配
class ResNet(nn.Module):
...
def initialize(self):
# self.load_state_dict(torch.load('../res/resnet50-19c8e357.pth'), strict=False)
self.load_state_dict(torch.load('../res/resnet50-19c8e357.pth',weights_only=False), strict=False)
class F3Net(nn.Module):
...
def initialize(self):
if self.cfg.snapshot:
# self.load_state_dict(torch.load(self.cfg.snapshot))
self.load_state_dict(torch.load(self.cfg.snapshot,weights_only=False))
else:
weight_init(self)修改./src/train.py
由于apex中amp调用和cuda直接相关,所有移除amp的调用,直接使用torch.npu.amp。
# from apex import amp
def train(Dataset, Network):
...
# 移除apex.amp.initialize 初始化
# net, optimizer = amp.initialize(net, optimizer, opt_level='O2')
net = net.to("npu")
scaler = torch.npu.amp.GradScaler()
sw = SummaryWriter(cfg.savepath)
global_step = 0
for epoch in range(cfg.epoch):
optimizer.param_groups[0]['lr'] = (1-abs((epoch+1)/(cfg.epoch+1)*2-1))*cfg.lr*0.1
optimizer.param_groups[1]['lr'] = (1-abs((epoch+1)/(cfg.epoch+1)*2-1))*cfg.lr
for step, (image, mask) in enumerate(loader):
image, mask = image.npu().float(), mask.npu().float()
#NPU AMP 自动混合精度上下文
with torch.npu.amp.autocast():
out1u, out2u, out2r, out3r, out4r, out5r = net(image)
loss1u = structure_loss(out1u, mask)
loss2u = structure_loss(out2u, mask)
loss2r = structure_loss(out2r, mask)
loss3r = structure_loss(out3r, mask)
loss4r = structure_loss(out4r, mask)
loss5r = structure_loss(out5r, mask)
loss = (loss1u+loss2u)/2+loss2r/2+loss3r/4+loss4r/8+loss5r/16
optimizer.zero_grad()
# with amp.scale_loss(loss, optimizer) as scale_loss:
# scale_loss.backward()
# optimizer.step()
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()此外,由于apex包和zope.interace包的版本存在适配问题,需要对apex/models.py文件进行修改,参考CSDN修改方案。
进入./src目录后执行python test.py| 适配操作 | 单图推理性能 |
|---|---|
| 开箱 | 0.927s |
| torch_npu | 0.033s |
| torch_npu+torch_air | 0.009s |