GroundingDINO模型训练NPU适配指导

1. 模型概述及场景

GroundingDINO是一个强大而灵活的开放集目标检测器，提出了一种新的目标检测范式，通过融合语言和视觉的Transformer架构以及跨模态的深度对齐策略，实现了对任意文本描述所指目标的精准定位。这种方法在COCO、LVIS和ODinW等多个零样本检测基准测试中表现出色，仅使用单阶段端到端训练就达到了与许多全监督检测器相媲美甚至更优的性能。GroundingDINO为解决开放世界中的目标检测问题提供了新的思路，展示了将语言引导与视觉定位深度融合的方法在复杂任务中的巨大潜力。本文介绍GroundingDINO训练迁移适配昇腾平台指导。

2. 准备运行环境

配套	版本
Python	3.11
torch	2.9.0
torch_npu	2.9.0
torchvision	0.24.0

1.1环境准备

设备型号	NPU配置
Atlas 800T A2	8卡

1.2 准备镜像

镜像地址：昇腾云各版本配套基础镜像

1.2.1 建议使用镜像：

机型	镜像名称	镜像地址
910B	8.5.1-910-ubuntu22.04-py3.11容器镜像	docker pull --platform=arm64 swr.cn-south-1.myhuaweicloud.com/ascendhub/cann:8.5.1-910-ubuntu22.04-py3.11

1.2.2 拉取镜像：

docker pull --platform=arm64 swr.cn-south-1.myhuaweicloud.com/ascendhub/cann:8.5.1-910-ubuntu22.04-py3.11

1.2.3 启动镜像：

docker run -itd --rm \

--name GroundingDINO_train \

--net=host \

--shm-size=500g \

--device /dev/davinci_manager \

--device /dev/devmm_svm \

--device /dev/hisi_hdc \

-v /usr/local/dcmi:/usr/local/dcmi \

-v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool \

-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \

-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \

-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \

-v /etc/ascend_install.info:/etc/ascend_install.info \

-v /etc/hccn.conf:/etc/hccn.conf \

-v /{挂载路径}:/{挂载路径} \

--privileged \

swr.cn-south-1.myhuaweicloud.com/ascendhub/cann:8.5.0-910b-ubuntu22.04-py3.11 bash

3. 运行指导

3.1 下载mmdetection源码

git clone https://github.com/open-mmlab/mmdetection
cd mmdetection
git reset --hard cfd5d3a9
pip install -r requirements.txt

3.2 下载GroundingDINO源码

git clone https://gitcode.com/ascend/ModelZoo-PyTorch.git
cd ModelZoo-PyTorch/PyTorch/built-in/cv/detection/GroundingDINO_for_PyTorch/ groundingdino_npu

3.3 安装基础依赖


pip install decorator attrs psutil scipy ml-dtypes cloudpickle tornado absl-py

# 修改 ascend-toolkit 路径
source /usr/local/Ascend/ascend-toolkit/set_env.sh

pip install opencv-python-headless

pip install mmengine==0.10.3

pip install jsonlines

pip install nltk

3.4 安装mmcv

git clone https://github.com/open-mmlab/mmcv

cd mmcv/mmcv

vim version.py  # 此处需要把第二行的__version__ = '2.2.0'改成'2.1.0'，然后保存退出

cd ..

pip install -r requirements.txt

pip install "setuptools<82"

# setuptools 库在mmcv种被调用，高版本（版本 ≥ 82.0.0），pkg_resources 模块被官方移除，导致会导致编译报错

MMCV_WITH_OPS=1 MAX_JOBS=8 FORCE_NPU=1 python setup.py build_ext 

pip install -e . --no-build-isolation  #关闭隔离构建，pip 直接使用当前配置好的虚拟环境

3.5 安装DrivingSDK仓

git clone https://gitcode.com/Ascend/DrivingSDK.git

bash ci/build.sh --python=3.11

pip install mx_driving-1.0.0+git4373151-cp311-cp311-linux_aarch64.whl

3.6 下载数据集

训练与评估所使用refcoco数据集，放入mmdetection目录下的refcoco文件夹中。

1.图片下载链接 http://images.cocodataset.org/zips/train2014.zip

2.标注文件下载链接 https://huggingface.co/GLIPModel/GLIP/tree/main/mdetr_annotations

下载文件：finetune_refcoco_val.json、finetune_refcoco_testA.json、finetune_refcoco_testB.json、finetune_refcoco_train.json

在refcoco目录下创建文件夹mdetr_annotations，并将这四个json文件放置于./refcoco/mdetr_annotations目录下

数据集文件夹结构如下：

├refcoco
├── train2014
│   ├── xxx.jpg
│   ├── ...
├── mdetr_annotations
│   ├── finetune_refcoco_val.json
│   ├── finetune_refcoco_testA.json
│   ├── finetune_refcoco_testB.json
│   ├── finetune_refcoco_train.json

3.7 数据格式转换

GroundingDINO模型使用的标注格式为ODVG格式，需要运行json转换脚本进行转换

cd mmdetection
python tools/dataset_converters/refcoco2odvg.py refcoco/mdetr_annotations

以上命令会在mdetr_annotations文件夹下生成finetune_refcoco_train_vg.json文件

3.8 下载权重

训练GroundingDINO需要提前准备BERT模型（作为文本编码器提取语义特征）、NLTK权重（用于文本分词和词性标注等预处理）以及MM-GDINO-B权重（作为模型的预训练骨干网络参数），三者共同支撑模型实现开放词汇的目标检测能力。

1.bert模型下载

wget https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz

将bert模型文件放置在./weights/bert目录下

2. NLTK权重
https://github.com/nltk/nltk_data

3. MM-GDINO-B权重

本次微调选用的是swin-B结构，其他结构权重可参考mmdetection仓库
https://github.com/open-mmlab/mmdetection/tree/main/configs/mm_grounding_dino

见表Zero-Shot COCO Results and Models

下载权重文件：grounding_dino_swin-b_pretrain_obj365_goldg_v3de-f83eef00.pth

wget https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-b_pretrain_obj365_goldg_v3det/grounding_dino_swin-b_pretrain_obj365_goldg_v3de-f83eef00.pth

3.9 修改配置文件

主要提供基于refcoco数据集全量微调的8卡训练脚本。本次微调使用的配置文件 groundingdino_npu/mm_grounding_dino_swin-b_finetune_b2_refcoco.py

#数据集路径及权重路径
lang_model_name = './weights/bert'
data_root = './refcoco/'
load_from = "./weights/grounding_dino_swin-b_pretrain_obj365_goldg_v3de-f83eef00.pth"

#训练集加载
train_dataloader = dict(
    batch_size=2,
    num_workers=2,
    dataset=dict(
        _delete_=True,
        type='ODVGDataset',
        data_root=data_root,
        ann_file="mdetr_annotations/finetune_refcoco_train_vg.json", #标注文件路径
        data_prefix=dict(img='train2014/'),   #训练集图片路径
        filter_cfg=dict(filter_empty_gt=False, min_size=32),
        return_classes=True,
        pipeline=train_pipeline))

3.10 修改启动脚本

修改groundingdino_npu/finetune_refcoco.sh文件

#数据集路径及权重路径

# 修改 ascend-toolkit 路径
source groundingdino_npu/env_npu.sh 

# 修改 Python 路径
PYTHON_PATH="Python Env Path"

GPUS=1 #使用卡数
batch_size=2 #BS

3.11 启动训练

bash groundingdino_npu/finetune_refcoco.sh

4. 训练结果

4.1 精度结果

芯片	AP	Precision @ 1	Precision @ 5	Precision @ 10
Atlas A2	0.3129	0.8421	0.9807	0.9932

4.2 性能结果

芯片	卡数	FPS	Batch_size	AMP_Type	Torch_Version
Atlas A2	8P	8.45	2	fp32	2.4.0