GroundingDINO 是一种最先进的开放集检测模型,可解决多项视觉任务,包括开放词汇检测(OVD)、Phrase Grounding(PG)、和指代性表达式理解(REC)。它的有效性已使其被广泛采用,成为各种下游应用的主流架构。
表 1 版本配套表
| 配套 | 版本 | 环境准备指导 |
|---|---|---|
| 固件与驱动 | 25.2.RC1 | Pytorch框架推理环境准备 |
| CANN | 8.2.RC1 | 包含kernels包和toolkit包 |
| Python | 3.11 | - |
| PyTorch | 2.1.0 | - |
git clone https://gitcode.com/ascend/ModelZoo-PyTorch.git
cd ModelZoo-PyTorch/ACL_PyTorch/built-in/cv/GroundingDINO git clone https://github.com/open-mmlab/mmdetection
cd mmdetection
git reset --hard cfd5d3a9# 下载BERT权重,并放置于mmdetection目录下
https://huggingface.co/google-bert/bert-base-uncased/tree/main
# 下载MM-GroundingDINO权重,并放置于weights目录下
https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-b_pretrain_obj365_goldg_v3det/grounding_dino_swin-b_pretrain_obj365_goldg_v3de-f83eef00.pth
# 下载animals.png,并放置于images目录下
https://github.com/microsoft/X-Decoder/tree/main/inference_demo/images
# 下载NLTK权重(可选)。MM-GroundingDINO在进行Phrase Grounding推理时可能会进行名词短语提取,虽然会在运行时下载特定的模型,但是考虑到有些用户运行环境无法联网,因此可以提前下载。
方式一:
下载模型到~/nltk_data路径下:https://www.nltk.org/nltk_data/
方式二:
import nltk
nltk.download('punkt', download_dir='~/nltk_data')
nltk.download('averaged_perceptron_tagger', download_dir='~/nltk_data')1、将ModelZoo-PyTorch/ACL_PyTorch/built-in/cv/GroundingDINO目录下述文件放到mmdetection目录demo下:
│ ├── image_demo_npu.py //本仓提供单图推理脚本
│ ├── video_demo_npu.py //本仓提供视频推理脚本
│ ├── register_im2col_to_torchair.py //本仓提供torchair算子注册文件
│ ├── register_roll_to_torchair.py //本仓提供torchair算子注册文件
│ ├── requirements.txt //本仓提供
│ └── install_requirements.sh //本仓提供依赖一键安装脚本2、将patch文件放到mmdetection目录deff_patch下:
│ ├── mmdetection_diff.patch //本仓提供
│ ├── mmengine_diff.patch //本仓提供
│ └── mmcv_diff.patch //本仓提供完整,目录如下:
mmdetection
├── demo
│ ├── image_demo.py
│ ├── video_demo.py
│ ├── image_demo_npu.py //本仓提供单图推理脚本
│ ├── video_demo_npu.py //本仓提供视频推理脚本
│ ├── register_im2col_to_torchair.py //本仓提供torchair算子注册文件
│ ├── register_roll_to_torchair.py //本仓提供torchair算子注册文件
│ ├── requirements.txt //本仓提供
│ └── install_requirements.sh //本仓提供依赖一键安装脚本
├── diff_patch
│ ├── mmdetection_diff.patch //本仓提供
│ ├── mmengine_diff.patch //本仓提供
│ └── mmcv_diff.patch //本仓提供
├── bert-base-uncased //BERT权重
├── weights
│ └── grounding_dino_swin-b_pretrain_obj365_goldg_v3de-f83eef00.pth //MM-GroundingDINO权重
├── images
│ └── animals.png
├── mmdet
├── resources
├── README.md
├── tests
├── tools
├── config
└── ...conda create -n groundingdino python=3.8
conda activate groundingdino
#在mmdetection目录下执行依赖一键安装脚本,会在自动拉取依赖仓并应用diff patch文件
source demo/install_requirements.sh# 指定使用NPU ID,默认为0
export ASCEND_RT_VISIBLE_DEVICES=0
# 执行图片推理命令
python demo/image_demo_npu.py images/animals.png configs/mm_grounding_dino/grounding_dino_swin-b_pretrain_obj365_goldg_v3det.py --weight weights/grounding_dino_swin-b_pretrain_obj365_goldg_v3de-f83eef00.pth --texts '$: coco' --device npu (--loop 10)
# 执行视频推理命令
python demo/video_demo_npu.py demo/demo.mp4 configs/mm_grounding_dino/grounding_dino_swin-b_pretrain_obj365_goldg_v3det.py weights/grounding_dino_swin-b_pretrain_obj365_goldg_v3de-f83eef00.pth (--batch_size 16)在推理开始后,首先会默认执行warm_up,目的是执行首次编译,首次编译时间较长,在warm_up结束后,会执行推理操作,并打屏计算结果和性能数据。
推理过程中遇到报错如下:
[root@autodl-container-33ab4e930e-c62598b5 mmdetection]# python demo/image_demo_npu.py images/animals.png configs/mm_grounding_dino/grounding_dino_swin-b_pretrain_obj365_goldg_v3det.py --weight weights/grounding_dino_swin-b_pretrain_obj365_goldg_v3de-f83eef00.pth --texts '$: coco' --device npu
报错一:
/usr/local/lib64/python3.11/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
报错二:
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
[ERROR] 2025-12-23-12:04:37 (PID:20889, Device:-1, RankID:-1) ERR99999 UNKNOWN application exceptionnumpy版本与Pytorch不兼容,重新安装匹配版本:
# 先卸载现有NumPy
pip uninstall -y numpy
# 安装与PyTorch 3.11兼容的稳定版本(推荐1.24.3)
pip install numpy==1.24.3libGL.so.1属于系统级图形库,需根据 Linux 发行版安装:
# 安装libGL核心依赖
yum install -y mesa-libGL-devel mesa-libGLU-devel
# 补充OpenCV其他常见依赖(避免后续报错)
yum install -y libX11-devel libXext-devel libXrender-devel libXt-devel