ModelScope 模型
iic/cv_nanodet_face-human-hand-detection在华为昇腾 NPU 上的推理适配与部署文档。
| 属性 | 值 |
|---|---|
| 模型来源 | iic/cv_nanodet_face-human-hand-detection |
| 检测类别 | 人体(body)、人脸(face)、人手(hand) |
| 骨干网络 | ShuffleNetV2-1.0x |
| 颈部网络 | GhostPAN (Path Aggregation Network with Ghost modules) |
| 检测头 | NanoDetPlus (GFL head, reg_max=7) |
| 输入尺寸 | 320×320 (BGR) |
| 模型大小 | 17 MB (645 参数张量) |
| 推理框架 | PyTorch + torch_npu |
| 组件 | 版本 |
|---|---|
| Python | ≥ 3.9 |
| PyTorch | 2.9.0 |
| torch_npu | 2.9.0.post1 |
| CANN | 8.5.1 |
| OpenCV | ≥ 4.0 |
| NumPy | ≥ 1.20 |
| 硬件 | Atlas 800 A2 (Ascend 910B) 或兼容设备 |
# 通过 ModelScope SDK 下载
pip install modelscope
python -c "
from modelscope import snapshot_download
snapshot_download('iic/cv_nanodet_face-human-hand-detection',
cache_dir='./models')
"模型权重路径: models/iic/cv_nanodet_face-human-hand-detection/pytorch_model.bin
# 自动选择设备(NPU 优先)
python inference.py --image test.jpg
# 指定 CPU
python inference.py --image test.jpg --device cpu
# 指定 NPU
python inference.py --image test.jpg --device npu
# 自定义参数
python inference.py \
--model-path models/iic/cv_nanodet_face-human-hand-detection/pytorch_model.bin \
--image test.jpg \
--device npu \
--input-size 320 \
--score-thresh 0.3 \
--output-dir ./outputpython inference.py --comparepython inference.py --benchmark --bench-iters 100from inference import load_model, inference, NanoDetPreProcessor, NanoDetPostProcessor
# 加载模型
model = load_model('models/iic/cv_nanodet_face-human-hand-detection/pytorch_model.bin',
device='npu:0')
# 推理
labels, bboxes, scores = inference(model, 'test.jpg', device='npu:0',
input_size=320, score_thresh=0.3)
# 结果解读
# labels: [0, 1, 2] → [人体, 人脸, 人手]
# bboxes: [[x1,y1,x2,y2], ...] 像素坐标
# scores: [0.96, 0.89, 0.82] 置信度$ python inference.py --image test_face_human_hand_detection.jpg --device npu
Loading model from: models/iic/cv_nanodet_face-human-hand-detection/pytorch_model.bin
Device: npu
Loaded 645/645 parameters from checkpoint
Model loaded successfully!
Warming up...
Warm-up done.
Running inference on: test_face_human_hand_detection.jpg
Inference time: 162.72 ms
============================================================
Detection Results (score > 0.3):
============================================================
Class Score BBox (x1,y1,x2,y2)
------------------------------------------------------------
人体(body) 0.9692 [0.0, 0.1, 368.0, 640.0]
人脸(face) 0.9165 [127.2, 82.9, 332.5, 366.2]
人手(hand) 0.8349 [75.5, 286.1, 240.8, 511.5]
Done! Device=npu, Inference=162.72ms, Detections=3
Saved result to: output/result_npu.jpg$ python inference.py --image test_face_human_hand_detection.jpg --device cpu
Loading model from: models/iic/cv_nanodet_face-human-hand-detection/pytorch_model.bin
Device: cpu
Loaded 645/645 parameters from checkpoint
Running inference on: test_face_human_hand_detection.jpg
Inference time: 74.19 ms
============================================================
Detection Results (score > 0.3):
============================================================
Class Score BBox (x1,y1,x2,y2)
------------------------------------------------------------
人体(body) 0.9691 [0.0, 0.1, 368.0, 640.0]
人脸(face) 0.9165 [127.2, 82.9, 332.5, 366.2]
人手(hand) 0.8350 [75.5, 286.1, 240.8, 511.5]
Done! Device=cpu, Inference=74.19ms, Detections=3
Saved result to: output/result_cpu.jpg$ python inference.py --compare
CPU vs NPU Precision Comparison (feature-level)
============================================================
cls_score level 0: max_diff=0.013860, mean_diff=0.000810, cos_sim=1.000000
cls_score level 1: max_diff=0.005308, mean_diff=0.000906, cos_sim=1.000000
cls_score level 2: max_diff=0.003007, mean_diff=0.000804, cos_sim=1.000000
cls_score level 3: max_diff=0.001945, mean_diff=0.000782, cos_sim=1.000000
bbox_pred level 0: max_diff=0.020175, mean_diff=0.001067, cos_sim=0.999999
bbox_pred level 1: max_diff=0.014530, mean_diff=0.001104, cos_sim=1.000000
bbox_pred level 2: max_diff=0.008117, mean_diff=0.001090, cos_sim=1.000000
bbox_pred level 3: max_diff=0.002050, mean_diff=0.000483, cos_sim=1.000000
Matched detections: 3
Score difference: mean=0.000054, max=0.000059$ python inference.py --benchmark --bench-iters 30
============================================================
Performance Benchmark: device=cpu, input_size=320, iters=30
============================================================
FPS: 21.0
============================================================
Performance Benchmark: device=npu:0, input_size=320, iters=30
============================================================
FPS: 95.8NPU 推理结果:

CPU 推理结果:

| 特征层 | 最大差异 | 平均差异 | 余弦相似度 |
|---|---|---|---|
| cls_score level 0 | 0.0139 | 0.0008 | 1.000000 |
| cls_score level 1 | 0.0053 | 0.0009 | 1.000000 |
| cls_score level 2 | 0.0030 | 0.0008 | 1.000000 |
| cls_score level 3 | 0.0019 | 0.0008 | 1.000000 |
| bbox_pred level 0 | 0.0202 | 0.0011 | 0.999999 |
| bbox_pred level 1 | 0.0145 | 0.0011 | 1.000000 |
| bbox_pred level 2 | 0.0081 | 0.0011 | 1.000000 |
| bbox_pred level 3 | 0.0021 | 0.0005 | 1.000000 |
| 检测 | 类别匹配 | 分数差异 | 边界框差异 (像素) |
|---|---|---|---|
| Det 0 (人体) | ✓ | 0.000059 | [0.00, 0.00, 0.00, 0.00] |
| Det 1 (人脸) | ✓ | 0.000049 | [0.00, 0.00, 0.01, 0.00] |
| Det 2 (人手) | ✓ | 0.000054 | [0.01, 0.00, 0.00, 0.00] |
结论: CPU 与 NPU 推理结果高度一致,特征余弦相似度 ≥ 0.999999,检测级最大分数差异 < 0.0001,边界框差异 < 0.01 像素。
| 检测 | 类别 | ModelScope 分数 | Ours 分数 | IoU |
|---|---|---|---|---|
| 人体 | body | 0.9679 | 0.9691 | >0.99 |
| 人脸 | face | 0.8987 | 0.9165 | 0.97 |
| 人手 | hand | 0.8202 | 0.8350 | >0.90 |
分数差异在 2% 以内(来自 Integral softmax 数值精度差异),IoU > 0.90,精度合格。
| 设备 | 平均延迟 | 标准差 | FPS | 加速比 |
|---|---|---|---|---|
| CPU | 47.21 ms | 0.24 ms | 21.2 | 1.0× |
| NPU (Ascend 910B) | 10.43 ms | 0.11 ms | 95.9 | 4.53× |
测试条件: batch_size=1, input_size=320×320, 100 次迭代取均值, warmup=5
nanodet-face-human-hand/
├── inference.py # 推理脚本(模型定义 + 推理 + 评估 + 基准测试)
├── README.md # 部署文档
├── models/ # 模型权重
│ └── iic/cv_nanodet_face-human-hand-detection/
│ ├── pytorch_model.bin
│ └── test_face_human_hand_detection.jpg
└── output/ # 推理输出
├── result_cpu.jpg
├── result_npu.jpg
├── precision_comparison.json
├── precision_detail.json
└── benchmark.jsonInput (3×320×320, BGR)
│
▼
ShuffleNetV2-1.0x Backbone
├─ stage2 → C2 (116, 80×80)
├─ stage3 → C3 (232, 40×40)
└─ stage4 → C4 (464, 20×20)
│
▼
GhostPAN Neck
├─ P4 (96, 20×20) ← top-down from C4
├─ P3 (96, 40×40) ← top-down from C3 + P4
└─ P2 (96, 80×80) ← top-down from C2 + P3
│
▼
NanoDetPlus Head (GFL)
├─ cls_score: 4 levels × num_classes
└─ bbox_pred: 4 levels × 4×(reg_max+1)
│
▼
Post-processing
├─ Integral (softmax → weighted sum → distance)
├─ distance2bbox (anchor + distance → xyxy)
├─ Per-class score filtering (thresh=0.3)
└─ NMS (IoU thresh=0.5)GhostPAN 前向传播修正:原始 ModelScope 实现中 feat_low 取自 inner_outs(前一步融合输出)而非原始 inputs,已严格对齐。
Integral 解码:GFL 边界框预测通过 softmax → 重塑为 (..., 4, reg_max+1) → 矩阵乘法得到距离值,而非简单求和。
预处理归一化:ModelScope 使用 (img/255 - mean/255) / (std/255),等价于 (img - mean) / std,mean/std 为 ImageNet BGR 值。
Anchor 中心点:使用 stride × (grid + 0.5) 构造锚框中心,而非网格角落。
NPU 设备适配:所有张量自动 .to(device),NPU 推理后 .cpu() 转回进行后处理。
本模型遵循 ModelScope 原始许可协议。