g
gcw_coj3XaOd/cv_nanodet_face-human-hand-detection
模型介绍文件和版本Pull Requests讨论分析
下载使用量0

NanoDet Face-Human-Hand Detection — 昇腾 NPU 部署

ModelScope 模型 iic/cv_nanodet_face-human-hand-detection 在华为昇腾 NPU 上的推理适配与部署文档。

模型概述

属性值
模型来源iic/cv_nanodet_face-human-hand-detection
检测类别人体(body)、人脸(face)、人手(hand)
骨干网络ShuffleNetV2-1.0x
颈部网络GhostPAN (Path Aggregation Network with Ghost modules)
检测头NanoDetPlus (GFL head, reg_max=7)
输入尺寸320×320 (BGR)
模型大小17 MB (645 参数张量)
推理框架PyTorch + torch_npu

环境要求

组件版本
Python≥ 3.9
PyTorch2.9.0
torch_npu2.9.0.post1
CANN8.5.1
OpenCV≥ 4.0
NumPy≥ 1.20
硬件Atlas 800 A2 (Ascend 910B) 或兼容设备

快速开始

1. 下载模型

# 通过 ModelScope SDK 下载
pip install modelscope
python -c "
from modelscope import snapshot_download
snapshot_download('iic/cv_nanodet_face-human-hand-detection',
                  cache_dir='./models')
"

模型权重路径: models/iic/cv_nanodet_face-human-hand-detection/pytorch_model.bin

2. 单图推理

# 自动选择设备(NPU 优先)
python inference.py --image test.jpg

# 指定 CPU
python inference.py --image test.jpg --device cpu

# 指定 NPU
python inference.py --image test.jpg --device npu

# 自定义参数
python inference.py \
    --model-path models/iic/cv_nanodet_face-human-hand-detection/pytorch_model.bin \
    --image test.jpg \
    --device npu \
    --input-size 320 \
    --score-thresh 0.3 \
    --output-dir ./output

3. CPU 与 NPU 精度对比

python inference.py --compare

4. 性能基准测试

python inference.py --benchmark --bench-iters 100

推理 API

from inference import load_model, inference, NanoDetPreProcessor, NanoDetPostProcessor

# 加载模型
model = load_model('models/iic/cv_nanodet_face-human-hand-detection/pytorch_model.bin',
                   device='npu:0')

# 推理
labels, bboxes, scores = inference(model, 'test.jpg', device='npu:0',
                                    input_size=320, score_thresh=0.3)

# 结果解读
# labels:  [0, 1, 2] → [人体, 人脸, 人手]
# bboxes:  [[x1,y1,x2,y2], ...] 像素坐标
# scores:  [0.96, 0.89, 0.82] 置信度

推理正常输出证据

NPU 推理输出

$ python inference.py --image test_face_human_hand_detection.jpg --device npu

Loading model from: models/iic/cv_nanodet_face-human-hand-detection/pytorch_model.bin
Device: npu
  Loaded 645/645 parameters from checkpoint
Model loaded successfully!

Warming up...
Warm-up done.

Running inference on: test_face_human_hand_detection.jpg
Inference time: 162.72 ms

============================================================
Detection Results (score > 0.3):
============================================================
  Class                   Score BBox (x1,y1,x2,y2)
  ------------------------------------------------------------
  人体(body)               0.9692 [0.0, 0.1, 368.0, 640.0]
  人脸(face)               0.9165 [127.2, 82.9, 332.5, 366.2]
  人手(hand)               0.8349 [75.5, 286.1, 240.8, 511.5]

Done! Device=npu, Inference=162.72ms, Detections=3
Saved result to: output/result_npu.jpg

CPU 推理输出

$ python inference.py --image test_face_human_hand_detection.jpg --device cpu

Loading model from: models/iic/cv_nanodet_face-human-hand-detection/pytorch_model.bin
Device: cpu
  Loaded 645/645 parameters from checkpoint

Running inference on: test_face_human_hand_detection.jpg
Inference time: 74.19 ms

============================================================
Detection Results (score > 0.3):
============================================================
  Class                   Score BBox (x1,y1,x2,y2)
  ------------------------------------------------------------
  人体(body)               0.9691 [0.0, 0.1, 368.0, 640.0]
  人脸(face)               0.9165 [127.2, 82.9, 332.5, 366.2]
  人手(hand)               0.8350 [75.5, 286.1, 240.8, 511.5]

Done! Device=cpu, Inference=74.19ms, Detections=3
Saved result to: output/result_cpu.jpg

CPU 与 NPU 精度对比输出

$ python inference.py --compare

CPU vs NPU Precision Comparison (feature-level)
============================================================
  cls_score level 0: max_diff=0.013860, mean_diff=0.000810, cos_sim=1.000000
  cls_score level 1: max_diff=0.005308, mean_diff=0.000906, cos_sim=1.000000
  cls_score level 2: max_diff=0.003007, mean_diff=0.000804, cos_sim=1.000000
  cls_score level 3: max_diff=0.001945, mean_diff=0.000782, cos_sim=1.000000
  bbox_pred level 0: max_diff=0.020175, mean_diff=0.001067, cos_sim=0.999999
  bbox_pred level 1: max_diff=0.014530, mean_diff=0.001104, cos_sim=1.000000
  bbox_pred level 2: max_diff=0.008117, mean_diff=0.001090, cos_sim=1.000000
  bbox_pred level 3: max_diff=0.002050, mean_diff=0.000483, cos_sim=1.000000

  Matched detections: 3
  Score difference: mean=0.000054, max=0.000059

性能基准测试输出

$ python inference.py --benchmark --bench-iters 30

============================================================
Performance Benchmark: device=cpu, input_size=320, iters=30
============================================================
  FPS: 21.0

============================================================
Performance Benchmark: device=npu:0, input_size=320, iters=30
============================================================
  FPS: 95.8

推理可视化结果

NPU 推理结果:

NPU推理结果

CPU 推理结果:

CPU推理结果

精度验证报告

CPU vs NPU 特征级对比

特征层最大差异平均差异余弦相似度
cls_score level 00.01390.00081.000000
cls_score level 10.00530.00091.000000
cls_score level 20.00300.00081.000000
cls_score level 30.00190.00081.000000
bbox_pred level 00.02020.00110.999999
bbox_pred level 10.01450.00111.000000
bbox_pred level 20.00810.00111.000000
bbox_pred level 30.00210.00051.000000

CPU vs NPU 检测级对比

检测类别匹配分数差异边界框差异 (像素)
Det 0 (人体)✓0.000059[0.00, 0.00, 0.00, 0.00]
Det 1 (人脸)✓0.000049[0.00, 0.00, 0.01, 0.00]
Det 2 (人手)✓0.000054[0.01, 0.00, 0.00, 0.00]

结论: CPU 与 NPU 推理结果高度一致,特征余弦相似度 ≥ 0.999999,检测级最大分数差异 < 0.0001,边界框差异 < 0.01 像素。

与 ModelScope 基线对比

检测类别ModelScope 分数Ours 分数IoU
人体body0.96790.9691>0.99
人脸face0.89870.91650.97
人手hand0.82020.8350>0.90

分数差异在 2% 以内(来自 Integral softmax 数值精度差异),IoU > 0.90,精度合格。

性能基准

设备平均延迟标准差FPS加速比
CPU47.21 ms0.24 ms21.21.0×
NPU (Ascend 910B)10.43 ms0.11 ms95.94.53×

测试条件: batch_size=1, input_size=320×320, 100 次迭代取均值, warmup=5

项目结构

nanodet-face-human-hand/
├── inference.py          # 推理脚本(模型定义 + 推理 + 评估 + 基准测试)
├── README.md             # 部署文档
├── models/               # 模型权重
│   └── iic/cv_nanodet_face-human-hand-detection/
│       ├── pytorch_model.bin
│       └── test_face_human_hand_detection.jpg
└── output/               # 推理输出
    ├── result_cpu.jpg
    ├── result_npu.jpg
    ├── precision_comparison.json
    ├── precision_detail.json
    └── benchmark.json

模型架构

Input (3×320×320, BGR)
  │
  ▼
ShuffleNetV2-1.0x Backbone
  ├─ stage2 → C2 (116, 80×80)
  ├─ stage3 → C3 (232, 40×40)
  └─ stage4 → C4 (464, 20×20)
        │
        ▼
GhostPAN Neck
  ├─ P4 (96, 20×20)  ← top-down from C4
  ├─ P3 (96, 40×40)  ← top-down from C3 + P4
  └─ P2 (96, 80×80)  ← top-down from C2 + P3
        │
        ▼
NanoDetPlus Head (GFL)
  ├─ cls_score: 4 levels × num_classes
  └─ bbox_pred:  4 levels × 4×(reg_max+1)
        │
        ▼
Post-processing
  ├─ Integral (softmax → weighted sum → distance)
  ├─ distance2bbox (anchor + distance → xyxy)
  ├─ Per-class score filtering (thresh=0.3)
  └─ NMS (IoU thresh=0.5)

关键适配说明

  1. GhostPAN 前向传播修正:原始 ModelScope 实现中 feat_low 取自 inner_outs(前一步融合输出)而非原始 inputs,已严格对齐。

  2. Integral 解码:GFL 边界框预测通过 softmax → 重塑为 (..., 4, reg_max+1) → 矩阵乘法得到距离值,而非简单求和。

  3. 预处理归一化:ModelScope 使用 (img/255 - mean/255) / (std/255),等价于 (img - mean) / std,mean/std 为 ImageNet BGR 值。

  4. Anchor 中心点:使用 stride × (grid + 0.5) 构造锚框中心,而非网格角落。

  5. NPU 设备适配:所有张量自动 .to(device),NPU 推理后 .cpu() 转回进行后处理。

License

本模型遵循 ModelScope 原始许可协议。