DETR-ResNet-50 是 Facebook 提出的端到端目标检测模型 (DEtection TRansformer),将 ResNet-50 作为视觉编码器,配合 Transformer 解码器完成目标检测任务。该模型在 COCO 2017 数据集上训练,可检测 91 种目标类别。
detr-resnet-50-ascend/
├── inference.py # 推理测试脚本
├── log.txt # 测试日志
├── README.md # 本文档
└── test_images/ # 测试图像
├── savanna.jpg # 场景图像1
└── football.jpg # 场景图像2docker exec -it test-modelagent bashsource /usr/local/Ascend/ascend-toolkit/set_env.sh模型文件位于 /data/ysws/agentsp/5-15/detr-resnet-50/ 目录下:
pip install transformers torch_npu pillow运行推理脚本进行目标检测:
cd /data/ysws/agentsp/5-15/detr-resnet-50-ascend/
python3 inference.py --mode inference
python3 inference.py --mode inference --device npu:0运行精度对比测试,验证 NPU 计算结果与 CPU 一致性:
cd /data/ysws/agentsp/5-15/detr-resnet-50-ascend/
python3 inference.py --mode precision_test| 参数 | 说明 | 默认值 |
|---|---|---|
--mode | 测试模式: inference 或 precision_test | inference |
--device | 运行设备: npu:0, cuda:0, cpu, auto (默认auto) | auto |
| 指标 | 实测值 | 阈值 | 状态 |
|---|---|---|---|
| Logits 相对误差 | 0.9704% | < 1.00% | PASS |
| 综合评估 | 正常范围内 | - | PASS |
| 操作 | 耗时 |
|---|---|
| NPU 推理时间 (800x800 输入) | ~7.4s |
| CPU 推理时间 (800x800 输入) | ~6.3s |
| 单张图像平均推理时间 | ~3.7s |
Processing: football.jpg - Football match scene
Image size: (600, 831)
Input shape: torch.Size([1, 3, 1108, 800])
Inference time: 0.054s
Detected 1 objects:
- person: 0.896 at [471.65, 58.11, 563.04, 132.31]from transformers import DetrImageProcessor, DetrForObjectDetection
from PIL import Image
MODEL_DIR = "/data/ysws/agentsp/5-15/detr-resnet-50"
processor = DetrImageProcessor.from_pretrained(MODEL_DIR)
model = DetrForObjectDetection.from_pretrained(MODEL_DIR)
model = model.to("npu:0")
model.eval()
image = Image.open("test.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
inputs = {k: v.to("npu:0") for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
target_sizes = torch.tensor([image.size[::-1]]).to("npu:0")
results = processor.post_process_object_detection(
outputs, target_sizes=target_sizes, threshold=0.7
)[0]
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
label_name = model.config.id2label.get(label.item(), 'unknown')
print(f"Detected {label_name} with confidence {score:.3f} at {box.tolist()}")from transformers import DetrImageProcessor, DetrForObjectDetection
from PIL import Image
import torch
MODEL_DIR = "/data/ysws/agentsp/5-15/detr-resnet-50"
processor = DetrImageProcessor.from_pretrained(MODEL_DIR)
model = DetrForObjectDetection.from_pretrained(MODEL_DIR).to("npu:0")
model.eval()
image_paths = ["image1.jpg", "image2.jpg", "image3.jpg"]
images = [Image.open(p).convert("RGB") for p in image_paths]
inputs = processor(images=images, return_tensors="pt")
inputs = {k: v.to("npu:0") for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
target_sizes = torch.tensor([img.size[::-1] for img in images]).to("npu:0")
results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.7)
for i, result in enumerate(results):
print(f"\nImage {i+1}: {len(result['scores'])} objects detected")| 组件 | 说明 |
|---|---|
| backbone | ResNet-50 卷积编码器 |
| encoder | 6 层 Transformer 编码器 |
| decoder | 6 层 Transformer 解码器 (100 object queries) |
| class_labels_classifier | 91 类分类器 |
| bbox_predictor | 边界框预测 MLP |
从 config.json 提取的关键参数:
{
"model_type": "detr",
"d_model": 256,
"decoder_layers": 6,
"decoder_attention_heads": 8,
"num_queries": 100,
"backbone": "resnet50"
}从 preprocessor_config.json 提取:
{
"do_normalize": true,
"do_resize": true,
"image_mean": [0.485, 0.456, 0.406],
"image_std": [0.229, 0.224, 0.225],
"size": {"shortest_edge": 800, "longest_edge": 1333}
}A: 这是正常现象。NPU 和 CPU 使用不同的数值精度 (bfloat16 vs float32),会导致边界框预测有微小差异。建议将 boxes 误差阈值放宽到 2%。Logits 分类误差通常在 1% 以内。
A: 检查图像内容是否包含模型可识别的目标。COCO 数据集包含 91 类目标 (人、车、动物等)。对于简单图像,可能需要降低 threshold (默认 0.7)。
A: 首次在 NPU 上运行会有算子编译开销。后续推理会更快。对于小图像,CPU 可能更快。建议使用批处理提高 NPU 利用率。
完整测试日志保存在 log.txt。包括:
本项目遵循 Apache-2.0 许可证