BlendMask是一种发表于2020年CVPR的实例分割算法,它通过创新的Blender模块融合自上而下(Top-down)与自下而上(Bottom-up)的思路,在保持高精度的同时实现了更快的推理速度。
表 1 版本配套表
| 配套 | 版本 | 环境准备指导 |
|---|---|---|
| 机器型号 | Atlas800I A2 | - |
| AI加速芯片 | 昇腾910B4 | - |
| Python | 3.11 | - |
| mindie | 2.3.0 | - |
swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.3.0-800I-A2-py311-openeuler24.03-lts docker run -dit --privileged --ipc=host --name=BlendMask_test --shm-size=1000g \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-v /usr/local/sbin:/usr/local/sbin \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/Ascend/firmware:/usr/local/Ascend/firmware \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /home:/home \
-v /data:/data \
-v /tmp:/tmp \
97f9fadfa336 \
/bin/bash
docker exec -it BlendMask bash# 拉取代码仓
git clone https://github.com/aim-uofa/AdelaiDet.git
cd AdelaiDet
# 安装依赖
python setup.py build develop
# detectron2安装
git clone https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2
# 昇腾适配
vim ./adet/modeling/blendmask/blendmask.py
添加:
import torch_npu
from torch_npu .contrib import transfer_to_npuhttps://huggingface.co/ZjuCv/AdelaiDet/blob/main/R_101_3x.pth
# 数据集
https://www.modelscope.cn/datasets/PAI/COCO2017/files
# git-lfs
https://github.com/git-lfs/git-lfs/releases修改configs/BlendMask/R_101_3x.yaml:WEIGHTS
_BASE_: "Base-BlendMask.yaml"
MODEL:
# 权重路径
WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-101.pkl"
RESNETS:
DEPTH: 101
SOLVER:
STEPS: (210000, 250000)
MAX_ITER: 270000
OUTPUT_DIR: "output/blendmask/R_101_3x"修改configs/BlendMask/Base-BlendMask.yaml:IMS_PER_BATCH
MODEL:
META_ARCHITECTURE: "BlendMask"
MASK_ON: True
BACKBONE:
NAME: "build_fcos_resnet_fpn_backbone"
RESNETS:
OUT_FEATURES: ["res3", "res4", "res5"]
FPN:
IN_FEATURES: ["res3", "res4", "res5"]
PROPOSAL_GENERATOR:
NAME: "FCOS"
BASIS_MODULE:
LOSS_ON: True
PANOPTIC_FPN:
COMBINE:
ENABLED: False
FCOS:
THRESH_WITH_CTR: True
USE_SCALE: False
DATASETS:
TRAIN: ("coco_2017_train",)
TEST: ("coco_2017_val",)
SOLVER:
IMS_PER_BATCH: 16
BASE_LR: 0.01 # Note that RetinaNet uses a different default learning rate
STEPS: (60000, 80000)
MAX_ITER: 90000
INPUT:
MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)生成数据集标签
cd datasets/
python prepare_thing_sem_from_instance.pypython demo/demo.py --config-file configs/BlendMask/R_101_3x.yaml --input /data/AdelaiDet/bus.jpg --output /data/AdelaiDet/result.jpg --confidence-threshold 0.35 --opts MODEL.WEIGHTS /data/AdelaiDet/datasets/R_101_3x.pth# 单卡训练脚本
OMP_NUM_THREADS=1 python tools/train_net.py \
--config-file configs/BlendMask/R_101_3x.yaml \
--num-gpus 1 \
OUTPUT_DIR training_dir/R_101_3x_new
# 多卡训练脚本
OMP_NUM_THREADS=1 python tools/train_net.py \
--config-file configs/BlendMask/R_101_3x.yaml \
--num-gpus 4 \
OUTPUT_DIR training_dir/R_101_3x_new表 2 推理性能
| 配套 | 显存+卡数 | 性能 |
|---|---|---|
| A2 | 32G*1卡 | 6.45img/s |
表 2 推理性能
| 配套 | 显存+卡数 | 性能 |
|---|---|---|
| A2 | 32G*1卡 | bs=12:7.42img/s |
pip uninstall -y rapidfuzz
pip install rapidfuzz==2.13.7修改 ./detectron2/detectron2/utils/collect_env.py:150
if has_gpu:
devices = defaultdict(list)
for k in range(torch.cuda.device_count()):
# cap = "_".join(str(x) for x in torch.cuda.get_device_capability(k))
dev_cap = torch.cuda.get_device_capability(k)
cap = "_".join(str(x) for x in (dev_cap if dev_cap is not None else (8, 0)))
name = torch.cuda.get_device_name(k) + " (" + str(cap) + ")"
devices[name].append(str(k))
for name, devs in devices.items():修改 vim /data/l00613958/AdelaiDet/detectron2/detectron2/engine/launch.py:101
if has_gpu:
assert num_gpus_per_machine <= torch.cuda.device_count()
global_rank = machine_rank * num_gpus_per_machine + local_rank
try:
dist.init_process_group(
#backend="NCCL" if has_gpu else "GLOO",
backend="NCCL",
init_method=dist_url,
world_size=world_size,
rank=global_rank,
timeout=timeout,