输入一张图像,并对其中手部区域进行检测,输出所有手部区域检测框、置信度和标签。
该模型主要用于手部检测任务,从图像中检测出人手框坐标、置信度和标签。该任务使用阿里云PAI-EasyCV框架下的YOLOX-PAI模型在TV-hand和coco-hand-big综合数据集上训练而来,YOLOX-PAI从Backbone(repvgg backbone)、Neck( gsconv/asff)、Head(toods/rtoods)、Loss(siou/giou)4个方向对原版的YOLOX进行优化,结合阿里巴巴计算平台PAI自研的PAI-Blade推理加速框架优化模型性能,在速度和精度上都比现阶段的40~50mAP的SOTA的YOLOv6更胜一筹。关于YOLOX-PAI细节请参考https://github.com/alibaba/EasyCV/blob/master/docs/source/tutorials/yolox.md。
yolox-pai论文参考https://arxiv.org/abs/2208.13040
使用方式:
目标场景:
在ModelScope框架上,提供输入图片,即可以通过简单的Pipeline调用来完成手部关键点检测任务。
也可以参考示例代码tests/pipelines/test_hand_detection.py
# numpy >= 1.20
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
model_id = 'damo/cv_yolox-pai_hand-detection'
hand_detection = pipeline(Tasks.domain_specific_object_detection, model=model_id)
output = hand_detection('https://modelscope.oss-cn-beijing.aliyuncs.com/test/images/hand_detection.jpg')
# the output contains boxes, scores and labels
print(output)本仓库提供了 YOLOX-PAI 手部检测模型在华为昇腾 NPU 上的完整适配方案。
环境要求
pip install torch torchvision torch_npu
pip install opencv-python numpy onnxruntime目录结构
yolox-hand-detection/
├── README.md # 本文件 — 适配部署文档
├── inference.py # 模型定义 + 推理代码
├── verify_accuracy.py # CPU vs NPU 精度对比脚本
├── benchmark.py # 性能评测脚本
├── model_files/
│ ├── pytorch_model.pt # 原始权重 (从 ModelScope 下载)
│ └── yolox_hand_det.onnx # 导出的 ONNX 模型 (opset 18)NPU 推理
# 单张图片
python inference.py --device npu --image model_files/resources/1.jpg --save_result
# 批量处理目录
python inference.py --device npu --image_dir model_files/resources/ --save_resultPython 应用程序接口
from inference import load_model, preprocess, postprocess
model = load_model('model_files/pytorch_model.pt', device='npu:0')
img_tensor, img_shape, pad_info = preprocess('image.jpg')
img_tensor = img_tensor.to('npu:0')
with torch.no_grad():
outputs = model(img_tensor)
outputs_cpu = [{k: v.cpu() for k, v in o.items()} for o in outputs]
detections = postprocess(outputs_cpu, img_shape, conf_thre=0.3, nms_thre=0.65)
print(detections)ONNX 推理
import onnxruntime as ort
from inference import preprocess, postprocess
sess = ort.InferenceSession('model_files/yolox_hand_det.onnx')
img_tensor, img_shape, _ = preprocess('image.jpg')
outputs = sess.run(None, {'input': img_tensor.numpy()})
# onnx_outputs: [cls_0, reg_0, obj_0, cls_1, reg_1, obj_1, cls_2, reg_2, obj_2]精度验证
运行 python verify_accuracy.py --device npu 可复现以下精度对比。
| 对比项 | max_diff | cosine_sim | 结论 |
|---|---|---|---|
| stride_8 cls | 9.2e-04 | 1.000000 | ✓ |
| stride_8 reg | 2.0e-03 | 1.000000 | ✓ |
| stride_8 obj | 1.7e-02 | 1.000000 | ✓ |
| stride_16 cls | 7.1e-04 | 0.999999 | ✓ |
| stride_16 reg | 1.1e-03 | 1.000000 | ✓ |
| stride_16 obj | 4.5e-03 | 1.000000 | ✓ |
| stride_32 cls | 1.3e-03 | 1.000000 | ✓ |
| stride_32 reg | 1.0e-03 | 1.000000 | ✓ |
| stride_32 obj | 1.1e-02 | 1.000000 | ✓ |
| Top-10 分数 | 9.4e-05 | — | ✓ |
| PyTorch vs ONNX | 7e-05 | — | ✓ |
CPU 与 NPU Top-10 检测分数对比:
CPU: [0.5558, 0.5498, 0.5424, 0.5327, 0.5274, 0.5271, 0.5244, 0.5212, 0.5189, 0.5181]
NPU: [0.5557, 0.5498, 0.5424, 0.5327, 0.5274, 0.5270, 0.5245, 0.5212, 0.5189, 0.5181]性能对比
运行 python benchmark.py --device cpu 和 python benchmark.py --device npu 可复现。
| 指标 | CPU (HiSilicon) | NPU (Ascend 910B) | 加速比 |
|---|---|---|---|
| 平均耗时 | 937.20 ms | 63.0 ms | 15.5x |
| FPS | 1.1 img/s | 15.9 img/s | 14.5x |
| 输入尺寸 | 1×3×640×640 | 1×3×640×640 | — |
推理输出示例 (conf_thre=0.5, nms_thre=0.45)
$ python inference.py --device npu --image model_files/resources/1.jpg --save_result --conf_thre 0.5 --nms_thre 0.45
Using NPU device: Ascend910_9362
Loading model from model_files/pytorch_model.pt...
Re-parameterizing (fusing conv+bn)...
Warming up...
Warmup done.
Image: model_files/resources/1.jpg
Detections: 15 hand(s)
Time: 63.0ms
hand 0.556 [957,363,1005,406]
hand 0.550 [969,401,1019,444]
hand 0.542 [971,427,1015,466]
hand 0.533 [990,752,1022,752]
hand 0.527 [970,452,1016,497]
hand 0.527 [931,348,976,397]
hand 0.524 [513,437,551,484]
hand 0.521 [516,464,549,509]
hand 0.519 [540,439,576,482]
hand 0.518 [974,479,1011,521]
hand 0.511 [643,466,679,509]
hand 0.507 [985,466,1024,508]
hand 0.506 [530,466,561,508]
hand 0.506 [660,493,688,532]
hand 0.503 [958,752,1004,752]
Result saved to output/result_1.jpg
Results saved to output/results.json检测结果可视化

训练数据来自公开数据集COCO-HAND_Big和TV_HAND,作者已经整理好并转换成coco格式,地址是https://www.modelscope.cn/datasets/modelscope/hand_detection_dataset/summary
模型在公开测试数据集上的评价指标、模型大小、参数量如下:
| 输入大小 | AR@1 | AR@10 | AR@100 | AR@100 (small) | AR@100(medium) | AR@100(large) |
|---|---|---|---|---|---|---|
| 640x640x3 | 0.2454 | 0.4295 | 0.4334 | 0.3884 | 0.5154 | 0.4978 |
| 输入大小 | mAP | mAP@.50IOU | mAP@.75IOU | mAP (small) | mAP (medium) | mAP(large) |
|---|---|---|---|---|---|---|
| 640x640x3 | 0.3526 | 0.7294 | 0.3035 | 0.3002 | 0.4414 | 0.4218 |

@article{DBLP:journals/corr/abs-2107-08430,
title = {YOLOX: Exceeding YOLO Series in 2021},
author = {Zheng Ge and Songtao Liu and Feng Wang and Zeming Li and Jian Sun},
journal = {arXiv preprint arXiv:2107.08430},
year = {2021}
}
@article{DBLP:journals/corr/abs-2208-13040,
title = {YOLOX-PAI: An Improved YOLOX Version by PAI[J]},
author = {Zou X, Wu Z, Zhou W, et al.},
journal = {arXiv preprint arXiv:2208.13040},
year = {2022}
} git clone https://www.modelscope.cn/damo/cv_yolox-pai_hand-detection.git