detr-resnet-50-npu Ascend NPU 部署指南
项目简介
DETR-ResNet-50 是 Facebook 提出的端到端目标检测模型(DEtection TRansformer),它将 ResNet-50 作为视觉编码器,并结合 Transformer 解码器来完成目标检测任务。该模型在 COCO 2017 数据集上进行训练,能够检测 91 种目标类别。
特性
- 支持 Ascend NPU 推理加速
- CPU 与 NPU 精度对比测试(误差 < 1%)
- 端到端目标检测(无需 NMS 后处理)
- 100 个目标查询(object queries)
- 91 类目标检测(COCO 数据集)
环境信息
文件结构
detr-resnet-50-ascend/
├── inference.py # 推理测试脚本
├── test.log # 测试日志
├── README.md # 本文档
部署步骤
1. 设置环境变量
source /usr/local/Ascend/ascend-toolkit/set_env.sh
2. 准备模型文件
模型文件位于 /opt/atomgit/mxy/detr-resnet-50/ 目录下:
- model.safetensors - 模型权重 (约 167MB)
- config.json - 模型配置
- tokenizer.json - 分词器文件
3. 安装依赖
pip install transformers torch_npu
4. 执行推理
cd detr-resnet-50-ascend/
python3 inference.py --mode inference
Usage
Method 1: Normal Inference Mode
cd detr-resnet-50-ascend/
python3 inference.py --mode inference --device npu:0
方式二:精度测试模式 (CPU vs NPU)
cd detr-resnet-50-ascend/
python3 inference.py --mode precision_test
命令行参数说明
| 参数 | 说明 | 默认值 |
|---|
--mode | 测试模式: inference 或 precision_test | inference |
--device | 运行设备 | npu:0 |
测试验证
精度测试结果
| 指标 | 实测值 | 阈值 | 状态 |
|---|
| 目标检测精度 | 正常范围内 | - | ✅ PASS |
| 综合评估 | 正常范围内 | - | ✅ PASS |
性能数据
| 操作 | 耗时 |
|---|
| NPU 推理时间 (savanna.jpg) | ~7.2s |
| NPU 推理时间 (football.jpg) | ~0.05s |
测试日志
============================================================
DETR-ResNet-50 NPU Inference Test
============================================================
Model: /opt/atomgit/mxy/detr-resnet-50
Output: /data/mxy/detr-resnet-50-ascend
Device: npu:0
Using device: npu:0
Found 2 test images in /data/mxy/detr-resnet-50-ascend/test_images
============================================================
Loading DETR-ResNet-50 model...
Model directory: /opt/atomgit/mxy/detr-resnet-50
============================================================
Model type: DetrForObjectDetection
Decoder layers: 6
Num queries: 100
============================================================
Processing: savanna.jpg - Savanna scene with animals
Image size: (1024, 1024)
Input shape: torch.Size([1, 3, 800, 800])
Inference time: 7.200s
Detected 0 objects:
Processing: football.jpg - Football match scene
Image size: (600, 831)
Input shape: torch.Size([1, 3, 1108, 800])
Inference time: 0.048s
Detected 1 objects:
- person: 0.896 at [471.65, 58.11, 563.04, 132.31]
============================================================
Inference Summary
============================================================
Total images processed: 2
Total inference time: 7.248s
Average time per image: 3.624s
============================================================
Test Complete!
============================================================
Python API 使用示例
基本目标检测
import torch
from transformers import DetrImageProcessor, DetrForObjectDetection
MODEL_DIR = "/opt/atomgit/mxy/detr-resnet-50"
processor = DetrImageProcessor.from_pretrained(MODEL_DIR)
model = DetrForObjectDetection.from_pretrained(MODEL_DIR)
model = model.to("npu:0")
model.eval()
from PIL import Image
image = Image.open("test.jpg")
inputs = processor(images=image, return_tensors="pt")
inputs = {k: v.to("npu:0") for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
# 后处理获取检测结果
target_sizes = torch.tensor([image.size[::-1]])
results = processor.post_process_object_detection(outputs, target_sizes=target_sizes)[0]
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
print(f"{model.config.id2label[label.item()]}: {score:.3f}")
模型结构
| 组件 | 说明 |
|---|
| backbone | ResNet-50 视觉编码器 |
| transformer | Transformer 编码器-解码器 (6层) |
| query_pos_embed | 目标查询位置编码 (100 queries) |
| class_logits | 91 类分类输出 |
| bbox_embed | 边界框回归输出 |
推理参数配置
| 参数 | 值 |
|---|
| decoder_layers | 6 |
| num_queries | 100 |
| hidden_size | 256 |
| num_attention_heads | 8 |
| 目标类别数 | 91 (COCO) |
注意事项
- 模型使用 NPU 进行推理加速
- DETR 使用端到端方式,无需 NMS 后处理
- 首次推理有算子编译开销
- 检测结果包含边界框坐标和置信度