gcw_coj3XaOd/cv_yolox-pai_hand-detection

手部检测模型

输入一张图像，并对其中手部区域进行检测，输出所有手部区域检测框、置信度和标签。

模型描述

该模型主要用于手部检测任务，从图像中检测出人手框坐标、置信度和标签。该任务使用阿里云PAI-EasyCV框架下的YOLOX-PAI模型在TV-hand和coco-hand-big综合数据集上训练而来，YOLOX-PAI从Backbone（repvgg backbone）、Neck（ gsconv/asff）、Head（toods/rtoods）、Loss（siou/giou）4个方向对原版的YOLOX进行优化，结合阿里巴巴计算平台PAI自研的PAI-Blade推理加速框架优化模型性能，在速度和精度上都比现阶段的40~50mAP的SOTA的YOLOv6更胜一筹。关于YOLOX-PAI细节请参考https://github.com/alibaba/EasyCV/blob/master/docs/source/tutorials/yolox.md。

yolox-pai论文参考https://arxiv.org/abs/2208.13040

使用方式和范围

使用方式：

输入任意图像，返回图像中所有的人手框坐标、置信度和标签。

目标场景:

手势关键点。
手势识别。
手部重建。
手势自然交互。

如何使用

在ModelScope框架上，提供输入图片，即可以通过简单的Pipeline调用来完成手部关键点检测任务。

推理代码范例

也可以参考示例代码tests/pipelines/test_hand_detection.py

# numpy >= 1.20
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

model_id = 'damo/cv_yolox-pai_hand-detection'
hand_detection = pipeline(Tasks.domain_specific_object_detection, model=model_id)
output = hand_detection('https://modelscope.oss-cn-beijing.aliyuncs.com/test/images/hand_detection.jpg')

# the output contains boxes, scores and labels
print(output)

昇腾 NPU 部署

本仓库提供了 YOLOX-PAI 手部检测模型在华为昇腾 NPU 上的完整适配方案。

环境要求

pip install torch torchvision torch_npu
pip install opencv-python numpy onnxruntime

目录结构

yolox-hand-detection/
├── README.md                 # 本文件 — 适配部署文档
├── inference.py              # 模型定义 + 推理代码
├── verify_accuracy.py        # CPU vs NPU 精度对比脚本
├── benchmark.py              # 性能评测脚本
├── model_files/
│   ├── pytorch_model.pt      # 原始权重 (从 ModelScope 下载)
│   └── yolox_hand_det.onnx   # 导出的 ONNX 模型 (opset 18)

NPU 推理

# 单张图片
python inference.py --device npu --image model_files/resources/1.jpg --save_result

# 批量处理目录
python inference.py --device npu --image_dir model_files/resources/ --save_result

Python 应用程序接口

from inference import load_model, preprocess, postprocess

model = load_model('model_files/pytorch_model.pt', device='npu:0')
img_tensor, img_shape, pad_info = preprocess('image.jpg')
img_tensor = img_tensor.to('npu:0')

with torch.no_grad():
    outputs = model(img_tensor)

outputs_cpu = [{k: v.cpu() for k, v in o.items()} for o in outputs]
detections = postprocess(outputs_cpu, img_shape, conf_thre=0.3, nms_thre=0.65)
print(detections)

ONNX 推理

import onnxruntime as ort
from inference import preprocess, postprocess

sess = ort.InferenceSession('model_files/yolox_hand_det.onnx')
img_tensor, img_shape, _ = preprocess('image.jpg')
outputs = sess.run(None, {'input': img_tensor.numpy()})
# onnx_outputs: [cls_0, reg_0, obj_0, cls_1, reg_1, obj_1, cls_2, reg_2, obj_2]

精度验证 运行 python verify_accuracy.py --device npu 可复现以下精度对比。

对比项	max_diff	cosine_sim	结论
stride_8 cls	9.2e-04	1.000000	✓
stride_8 reg	2.0e-03	1.000000	✓
stride_8 obj	1.7e-02	1.000000	✓
stride_16 cls	7.1e-04	0.999999	✓
stride_16 reg	1.1e-03	1.000000	✓
stride_16 obj	4.5e-03	1.000000	✓
stride_32 cls	1.3e-03	1.000000	✓
stride_32 reg	1.0e-03	1.000000	✓
stride_32 obj	1.1e-02	1.000000	✓
Top-10 分数	9.4e-05	—	✓
PyTorch vs ONNX	7e-05	—	✓

CPU 与 NPU Top-10 检测分数对比:

CPU: [0.5558, 0.5498, 0.5424, 0.5327, 0.5274, 0.5271, 0.5244, 0.5212, 0.5189, 0.5181]
NPU: [0.5557, 0.5498, 0.5424, 0.5327, 0.5274, 0.5270, 0.5245, 0.5212, 0.5189, 0.5181]

性能对比 运行 python benchmark.py --device cpu 和 python benchmark.py --device npu 可复现。

指标	CPU (HiSilicon)	NPU (Ascend 910B)	加速比
平均耗时	937.20 ms	63.0 ms	15.5x
FPS	1.1 img/s	15.9 img/s	14.5x
输入尺寸	1×3×640×640	1×3×640×640	—

推理输出示例 (conf_thre=0.5, nms_thre=0.45)

$ python inference.py --device npu --image model_files/resources/1.jpg --save_result --conf_thre 0.5 --nms_thre 0.45

Using NPU device: Ascend910_9362
Loading model from model_files/pytorch_model.pt...
  Re-parameterizing (fusing conv+bn)...
  Warming up...
  Warmup done.

  Image: model_files/resources/1.jpg
  Detections: 15 hand(s)
  Time: 63.0ms
    hand 0.556 [957,363,1005,406]
    hand 0.550 [969,401,1019,444]
    hand 0.542 [971,427,1015,466]
    hand 0.533 [990,752,1022,752]
    hand 0.527 [970,452,1016,497]
    hand 0.527 [931,348,976,397]
    hand 0.524 [513,437,551,484]
    hand 0.521 [516,464,549,509]
    hand 0.519 [540,439,576,482]
    hand 0.518 [974,479,1011,521]
    hand 0.511 [643,466,679,509]
    hand 0.507 [985,466,1024,508]
    hand 0.506 [530,466,561,508]
    hand 0.506 [660,493,688,532]
    hand 0.503 [958,752,1004,752]
  Result saved to output/result_1.jpg

Results saved to output/results.json

检测结果可视化

模型局限性以及可能的偏差

输入图像存在人手严重残缺或遮挡的情形下，模型会出现误检的现象。
高速运动模糊的情形下，模型会出现人手误检的现象。

训练数据介绍

训练数据来自公开数据集COCO-HAND_Big和TV_HAND，作者已经整理好并转换成coco格式，地址是https://www.modelscope.cn/datasets/modelscope/hand_detection_dataset/summary

数据评估及结果

测评指标

模型在公开测试数据集上的评价指标、模型大小、参数量如下：

输入大小	AR@1	AR@10	AR@100	AR@100 (small)	AR@100(medium)	AR@100(large)
640x640x3	0.2454	0.4295	0.4334	0.3884	0.5154	0.4978

输入大小	mAP	mAP@.50IOU	mAP@.75IOU	mAP (small)	mAP (medium)	mAP(large)
640x640x3	0.3526	0.7294	0.3035	0.3002	0.4414	0.4218

模型效果

手部检测结果

引用

@article{DBLP:journals/corr/abs-2107-08430,
  title     = {YOLOX: Exceeding YOLO Series in 2021},
  author    = {Zheng Ge and Songtao Liu and Feng Wang and Zeming Li and Jian Sun},
  journal   = {arXiv preprint arXiv:2107.08430},
  year      = {2021}
}

@article{DBLP:journals/corr/abs-2208-13040,
  title     = {YOLOX-PAI: An Improved YOLOX Version by PAI[J]},
  author    = {Zou X, Wu Z, Zhou W, et al.},
  journal   = {arXiv preprint arXiv:2208.13040},
  year      = {2022}
}

通过 HTTP 克隆

 git clone https://www.modelscope.cn/damo/cv_yolox-pai_hand-detection.git