SVHN Multi-digit Recognition (Ascend NPU 适配)

基于昇腾 NPU 的多位数门牌号识别模型，采用深度卷积神经网络实现从街景图像中识别门牌号数字。

环境要求

硬件: 华为昇腾 Atlas 800 A2/A3 (Ascend 910B/910A)
CANN: 8.5.1+
Python: 3.11+
PyTorch: 2.x with torch_npu

依赖安装

pip install torch torch_npu torchvision pillow numpy

快速开始

下载模型

pip install modelscope
modelscope download --model Genius-Society/svhn

NPU 推理

python inference.py --image <image_path> --device npu

CPU 推理 (对比)

python inference.py --image <image_path> --device cpu

精度对比 (CPU vs NPU)

python inference.py --image <image_path> --device npu --compare

模型说明

项目	说明
模型名称	SVHN Multi-digit Recognition
任务类型	多位数门牌号识别
模型结构	8 层 CNN + 2 层 FC + 6 输出头
输入	RGB 图像 (自动 resize 到 54x54)
输出	数字长度 + 5 位数字预测 (每位置 0-9 + blank)
参数量	42,868,734 (约 43M)
训练数据集	Google Street View House Numbers (SVHN)
原始精度	89% (官方报告)
框架	PyTorch + torch_npu

模型架构

Input (3×54×54)
  ├── Conv2d(3→48, 5×5) + BN + ReLU
  ├── Conv2d(48→64, 5×5) + BN + ReLU + MaxPool(2×2)
  ├── Conv2d(64→128, 5×5) + BN + ReLU
  ├── Conv2d(128→160, 5×5) + BN + ReLU + MaxPool(2×2)
  ├── Conv2d(160→192, 5×5) + BN + ReLU
  ├── Conv2d(192→192, 5×5) + BN + ReLU + MaxPool(2×2)
  ├── Conv2d(192→192, 5×5) + BN + ReLU
  ├── Conv2d(192→192, 5×5) + BN + ReLU
  ├── AdaptiveAvgPool2d(7×7) → Flatten(9408)
  ├── Linear(9408→3072) + ReLU
  ├── Linear(3072→3072) + ReLU
  └── Output Heads:
       ├── digit_length: Linear(3072→7)   # 数字位数预测
       ├── digit1: Linear(3072→11)        # 第1位数字
       ├── digit2: Linear(3072→11)        # 第2位数字
       ├── digit3: Linear(3072→11)        # 第3位数字
       ├── digit4: Linear(3072→11)        # 第4位数字
       └── digit5: Linear(3072→11)        # 第5位数字

昇腾 NPU 适配验证结果

精度验证

指标	结果
NPU vs CPU 最大偏差	2.04 × 10⁻⁴
分类一致性	100% (所有测试样本 NPU/CPU 输出一致)
精度判定	✅ PASS (< 1%)

各输出头精度偏差：

输出头	平均偏差	最大偏差
digit_length	6.43e-05	1.54e-04
digit1	9.28e-05	2.00e-04
digit2	1.06e-04	1.84e-04
digit3	7.91e-05	1.87e-04
digit4	6.60e-05	2.04e-04
digit5	5.07e-05	1.53e-04

性能验证

指标	CPU	NPU	加速比
平均延迟	111.00 ms	1.38 ms	80.6×
P50 延迟	110.64 ms	1.37 ms	80.8×
P95 延迟	115.09 ms	1.41 ms	81.6×
P99 延迟	118.73 ms	1.48 ms	80.2×
吞吐量 (bs=1)	9.0 imgs/s	726.4 imgs/s	80.7×

NPU 批量吞吐量：

Batch Size	延迟	吞吐量
1	1.40 ms	714.7 imgs/s
4	1.43 ms	2789.5 imgs/s
8	1.46 ms	5464.5 imgs/s
16	1.81 ms	8833.2 imgs/s

评测方法

精度评测和性能评测源代码位于 evaluation/ 目录：

# 精度评测 (CPU vs NPU, 100 随机样本 + 真实图片)
python evaluation/eval_precision.py

# 性能评测 (延迟 + 吞吐量)
python evaluation/eval_performance.py

文件清单

svhn_npu_adapter/
├── model.py                      # 模型定义 (从 checkpoint 重构)
├── inference.py                  # NPU 推理脚本
├── README.md                     # 本文档
├── test_load.py                  # 模型加载测试
└── evaluation/
    ├── eval_precision.py         # 精度评测脚本
    └── eval_performance.py       # 性能评测脚本

致谢

SVHN Multi-digit Recognition (Ascend NPU 适配)

基于昇腾 NPU 的多位数门牌号识别模型，采用深度卷积神经网络实现从街景图像中识别门牌号数字。

环境要求

硬件: 华为昇腾 Atlas 800 A2/A3 (Ascend 910B/910A)
CANN: 8.5.1+
Python: 3.11+
PyTorch: 2.x with torch_npu

依赖安装

pip install torch torch_npu torchvision pillow numpy

快速开始

下载模型

pip install modelscope
modelscope download --model Genius-Society/svhn

NPU 推理

python inference.py --image <image_path> --device npu

CPU 推理 (对比)

python inference.py --image <image_path> --device cpu

精度对比 (CPU vs NPU)

python inference.py --image <image_path> --device npu --compare

模型说明

项目	说明
模型名称	SVHN Multi-digit Recognition
任务类型	多位数门牌号识别
模型结构	8 层 CNN + 2 层 FC + 6 输出头
输入	RGB 图像 (自动 resize 到 54x54)
输出	数字长度 + 5 位数字预测 (每位置 0-9 + blank)
参数量	42,868,734 (约 43M)
训练数据集	Google Street View House Numbers (SVHN)
原始精度	89% (官方报告)
框架	PyTorch + torch_npu

模型架构

Input (3×54×54)
  ├── Conv2d(3→48, 5×5) + BN + ReLU
  ├── Conv2d(48→64, 5×5) + BN + ReLU + MaxPool(2×2)
  ├── Conv2d(64→128, 5×5) + BN + ReLU
  ├── Conv2d(128→160, 5×5) + BN + ReLU + MaxPool(2×2)
  ├── Conv2d(160→192, 5×5) + BN + ReLU
  ├── Conv2d(192→192, 5×5) + BN + ReLU + MaxPool(2×2)
  ├── Conv2d(192→192, 5×5) + BN + ReLU
  ├── Conv2d(192→192, 5×5) + BN + ReLU
  ├── AdaptiveAvgPool2d(7×7) → Flatten(9408)
  ├── Linear(9408→3072) + ReLU
  ├── Linear(3072→3072) + ReLU
  └── Output Heads:
       ├── digit_length: Linear(3072→7)   # 数字位数预测
       ├── digit1: Linear(3072→11)        # 第1位数字
       ├── digit2: Linear(3072→11)        # 第2位数字
       ├── digit3: Linear(3072→11)        # 第3位数字
       ├── digit4: Linear(3072→11)        # 第4位数字
       └── digit5: Linear(3072→11)        # 第5位数字

昇腾 NPU 适配验证结果

精度验证

指标	结果
NPU vs CPU 最大偏差	2.04 × 10⁻⁴
分类一致性	100% (所有测试样本 NPU/CPU 输出一致)
精度判定	✅ PASS (< 1%)

各输出头精度偏差：

输出头	平均偏差	最大偏差
digit_length	6.43e-05	1.54e-04
digit1	9.28e-05	2.00e-04
digit2	1.06e-04	1.84e-04
digit3	7.91e-05	1.87e-04
digit4	6.60e-05	2.04e-04
digit5	5.07e-05	1.53e-04

性能验证

指标	CPU	NPU	加速比
平均延迟	111.00 ms	1.38 ms	80.6×
P50 延迟	110.64 ms	1.37 ms	80.8×
P95 延迟	115.09 ms	1.41 ms	81.6×
P99 延迟	118.73 ms	1.48 ms	80.2×
吞吐量 (bs=1)	9.0 imgs/s	726.4 imgs/s	80.7×

NPU 批量吞吐量：

Batch Size	延迟	吞吐量
1	1.40 ms	714.7 imgs/s
4	1.43 ms	2789.5 imgs/s
8	1.46 ms	5464.5 imgs/s
16	1.81 ms	8833.2 imgs/s

评测方法

精度评测和性能评测源代码位于 evaluation/ 目录：

# 精度评测 (CPU vs NPU, 100 随机样本 + 真实图片)
python evaluation/eval_precision.py

# 性能评测 (延迟 + 吞吐量)
python evaluation/eval_performance.py

文件清单

svhn_npu_adapter/
├── model.py                      # 模型定义 (从 checkpoint 重构)
├── inference.py                  # NPU 推理脚本
├── README.md                     # 本文档
├── test_load.py                  # 模型加载测试
└── evaluation/
    ├── eval_precision.py         # 精度评测脚本
    └── eval_performance.py       # 性能评测脚本