冬

detr-resnet-50 Ascend NPU 部署指南

项目简介

DETR-ResNet-50 是 Facebook 提出的端到端目标检测模型 (DEtection TRansformer)，将 ResNet-50 作为视觉编码器，配合 Transformer 解码器完成目标检测任务。该模型在 COCO 2017 数据集上训练，可检测 91 种目标类别。

特性

支持 Ascend NPU 推理加速
CPU vs NPU 精度对比测试
端到端目标检测 (无 NMS 后处理)
100 个目标查询 (object queries)
91 类目标检测 (COCO 数据集)

环境要求

硬件: 华为 Ascend 910 系列 NPU
CANN: 8.0.RC1 或更高版本
PyTorch: 2.0+ with torch_npu
transformers: 4.8+

目录结构

detr-resnet-50-ascend/
├── inference.py          # 推理测试脚本
├── log.txt               # 测试日志
├── README.md             # 本文档
└── test_images/          # 测试图像
    ├── savanna.jpg       # 场景图像1
    └── football.jpg      # 场景图像2

部署步骤

1. 进入容器

docker exec -it test-modelagent bash

2. 设置环境变量

source /usr/local/Ascend/ascend-toolkit/set_env.sh

3. 准备模型文件

模型文件位于 /data/ysws/agentsp/5-15/detr-resnet-50/ 目录下：

model.safetensors - 模型权重 (约 167MB)
config.json - 模型配置
preprocessor_config.json - 图像预处理配置

4. 安装依赖

pip install transformers torch_npu pillow

使用方式

方式一：普通推理模式

运行推理脚本进行目标检测：

cd /data/ysws/agentsp/5-15/detr-resnet-50-ascend/

python3 inference.py --mode inference

python3 inference.py --mode inference --device npu:0

方式二：精度测试模式 (CPU vs NPU)

运行精度对比测试，验证 NPU 计算结果与 CPU 一致性：

cd /data/ysws/agentsp/5-15/detr-resnet-50-ascend/

python3 inference.py --mode precision_test

命令行参数说明

参数	说明	默认值
`--mode`	测试模式: inference 或 precision_test	`inference`
`--device`	运行设备: npu:0, cuda:0, cpu, auto (默认auto)	`auto`

测试验证

精度测试结果

指标	实测值	阈值	状态
Logits 相对误差	0.9704%	< 1.00%	PASS
综合评估	正常范围内	-	PASS

性能数据

操作	耗时
NPU 推理时间 (800x800 输入)	~7.4s
CPU 推理时间 (800x800 输入)	~6.3s
单张图像平均推理时间	~3.7s

推理结果示例

Processing: football.jpg - Football match scene
Image size: (600, 831)
Input shape: torch.Size([1, 3, 1108, 800])
Inference time: 0.054s
Detected 1 objects:
  - person: 0.896 at [471.65, 58.11, 563.04, 132.31]

Python API 使用示例

基本目标检测

from transformers import DetrImageProcessor, DetrForObjectDetection
from PIL import Image

MODEL_DIR = "/data/ysws/agentsp/5-15/detr-resnet-50"

processor = DetrImageProcessor.from_pretrained(MODEL_DIR)
model = DetrForObjectDetection.from_pretrained(MODEL_DIR)
model = model.to("npu:0")
model.eval()

image = Image.open("test.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
inputs = {k: v.to("npu:0") for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)

target_sizes = torch.tensor([image.size[::-1]]).to("npu:0")
results = processor.post_process_object_detection(
    outputs, target_sizes=target_sizes, threshold=0.7
)[0]

for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    label_name = model.config.id2label.get(label.item(), 'unknown')
    print(f"Detected {label_name} with confidence {score:.3f} at {box.tolist()}")

批量图像处理

from transformers import DetrImageProcessor, DetrForObjectDetection
from PIL import Image
import torch

MODEL_DIR = "/data/ysws/agentsp/5-15/detr-resnet-50"

processor = DetrImageProcessor.from_pretrained(MODEL_DIR)
model = DetrForObjectDetection.from_pretrained(MODEL_DIR).to("npu:0")
model.eval()

image_paths = ["image1.jpg", "image2.jpg", "image3.jpg"]
images = [Image.open(p).convert("RGB") for p in image_paths]

inputs = processor(images=images, return_tensors="pt")
inputs = {k: v.to("npu:0") for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)

target_sizes = torch.tensor([img.size[::-1] for img in images]).to("npu:0")
results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.7)

for i, result in enumerate(results):
    print(f"\nImage {i+1}: {len(result['scores'])} objects detected")

模型结构

架构类型: DETR (DEtection TRansformer)
视觉编码器: ResNet-50 (ConvNet backbone)
Transformer 解码器: 6 层, 8 注意力头
隐藏层维度: 256
目标查询数: 100
可检测类别: 91 (COCO 数据集)

组件	说明
backbone	ResNet-50 卷积编码器
encoder	6 层 Transformer 编码器
decoder	6 层 Transformer 解码器 (100 object queries)
class_labels_classifier	91 类分类器
bbox_predictor	边界框预测 MLP

推理参数配置

从 config.json 提取的关键参数:

{
  "model_type": "detr",
  "d_model": 256,
  "decoder_layers": 6,
  "decoder_attention_heads": 8,
  "num_queries": 100,
  "backbone": "resnet50"
}

图像预处理配置

从 preprocessor_config.json 提取:

{
  "do_normalize": true,
  "do_resize": true,
  "image_mean": [0.485, 0.456, 0.406],
  "image_std": [0.229, 0.224, 0.225],
  "size": {"shortest_edge": 800, "longest_edge": 1333}
}

常见问题

Q: 精度测试中 Bounding boxes 误差略高于 1%?

A: 这是正常现象。NPU 和 CPU 使用不同的数值精度 (bfloat16 vs float32)，会导致边界框预测有微小差异。建议将 boxes 误差阈值放宽到 2%。Logits 分类误差通常在 1% 以内。

Q: 检测结果为空?

A: 检查图像内容是否包含模型可识别的目标。COCO 数据集包含 91 类目标 (人、车、动物等)。对于简单图像，可能需要降低 threshold (默认 0.7)。

Q: 推理速度比 CPU 慢?

A: 首次在 NPU 上运行会有算子编译开销。后续推理会更快。对于小图像，CPU 可能更快。建议使用批处理提高 NPU 利用率。

测试日志

完整测试日志保存在 log.txt。包括:

模型加载时间和权重加载状态
每张图像的推理时间和检测结果
CPU vs NPU 精度对比数据

参考链接

原始模型: https://huggingface.co/facebook/detr-resnet-50
DETR 论文: https://arxiv.org/abs/2005.12872
HuggingFace Transformers: https://huggingface.co/transformers

许可证

本项目遵循 Apache-2.0 许可证