duguang-ocr-onnx — 昇腾 NPU 推理适配

概述

本仓库提供 duguang-ocr-onnx 在华为昇腾 NPU (Ascend 910) 上的推理方案。支持 CPU (ONNX Runtime) 和 NPU (torch_npu + onnx2torch) 双模式运行，并提供精度与性能对比评测工具。

模型架构

组件	架构	输入	输出
文本检测 (DBNet)	ConvNeXt + DBNet	[1,3,800,800] (也可用 1600/2400)	[1,800,800] 概率图
文本识别 (CRNN)	LightweightEdge (轻量 CRNN)	[3,32,300] (small) / [N,32,300/640] (large)	[3,75,7644] CTC logits

模型版本

提供 small 和 large 两个变体:

特性	small	large
检测模型输入	800/1600/2400	800/1600/2400
识别模型 batch	固定 3	动态 (推荐 1-8)
识别输入宽度	300	300 或 640
推理速度	较快	更准确

环境要求

组件	版本要求
Python	≥ 3.10
ONNX Runtime	≥ 1.15 (`pip install onnxruntime`)
昇腾 CANN	≥ 6.0 (NPU 模式需要)
PyTorch	≥ 2.1 (NPU 模式需要)
torch_npu	对应 CANN 版本
onnx2torch	≥ 1.5 (NPU 模式需要)

安装依赖

# 基础依赖 (CPU 模式)
pip install numpy opencv-python scipy onnxruntime

# NPU 模式额外依赖
pip install torch torch_npu onnx2torch

快速开始

1. 下载模型

# 从 ModelScope 下载 (small 版本)
git clone https://modelscope.cn/duguang/duguang-ocr-onnx.git

# 或直接使用本仓库已下载的模型
ls /opt/atomgit/models/duguang-ocr-onnx/mscoder/duguang-ocr-onnx/

2. CPU 推理

cd deliverables

python3 inference.py \
  -i test_label.png \
  -m /path/to/duguang-ocr-onnx/small

# 或自动检测模型目录
python3 inference.py -i test_label.png

3. NPU 推理

python3 inference.py \
  -i test_label.png \
  -m /path/to/duguang-ocr-onnx/small \
  --mode npu

4. CPU vs NPU 精度对比

python3 inference.py \
  -i test_label.png \
  -m /path/to/duguang-ocr-onnx/small \
  --mode compare

5. Large 模型

python3 inference.py \
  -i test_label.png \
  -m /path/to/duguang-ocr-onnx/large \
  --mode compare \
  --img-size 1600

推理结果

CPU 推理输出示例

测试图像 test_label.png (800x400):

收件人: 张三
联系电话: 13800138000
地址: 北京市海淀区中关村大街1号
商品: 电子产品 数量: 2

以 small 模型为例，检测结果 (9 个文本框，位置准确):

============================================================
  OCR 推理 - 模式: CPU
============================================================
  输入图像: test_label.png (800x400)
  检测模型: model_800x800.onnx
  检测耗时: 0.603s, 检测框: 9 个
  识别模型: model.onnx
  模型版本: small
  词表大小: 7643
  识别文本行: 9 行

    [1] 在劳在独在
    ...
    [9] 和在外在完在好在无在损在情在况在...

说明: 该识别模型在特定领域的退货退款模板文本上训练 (词表前20字符为"在保证商品和外包完好无损情况下与本店客服联系百代采购周期"), 对通用文本的识别效果有限。检测模型 (DBNet) 工作正常，能准确定位文本区域。

可视化结果

运行后将生成 result.png，在原图上绘制检测框和识别文本。

精度对比报告

数值精度 (模型输出层对比)

CPU (ONNX Runtime) vs NPU (torch_npu) 的逐元素数值对比:

指标	值
输入	[3, 3, 32, 300] (small 模型)
CPU 推理耗时	344.0 ms
NPU 推理耗时	114.8 ms (~3x 加速)
最大绝对误差	5.08e-05
平均绝对误差	5.12e-06
余弦相似度	1.00000000
ArgMax 一致率	100.00%

结论: FP32 精度下 CPU 与 NPU 输出完全一致 (ArgMax 100% 一致, 余弦相似度 = 1.0), 说明 onnx2torch 转换在数值精度上无损失。

推理性能对比

阶段	CPU (ms)	NPU (ms)	加速比
文本检测	~1086	—	—
文本识别	~344	~115	3.0x

文件结构

deliverables/
├── inference.py                  # 主推理脚本 (CPU/NPU/Compare)
├── eval_precision.py             # 精度评测脚本
├── eval_performance.py           # 性能评测脚本
├── run_log.txt                   # 运行日志 (模板)
├── precision_report.json         # 精度评测输出
└── performance_report.json       # 性能评测输出

推理脚本使用方法

# CPU 推理 (默认)
python3 inference.py -i test_label.png -m <模型目录>

# NPU 推理
python3 inference.py -i test_label.png -m <模型目录> --mode npu

# CPU+NPU 精度对比
python3 inference.py -i test_label.png -m <模型目录> --mode compare

# 数值精度评测 (模型输出层)
python3 inference.py -i test_label.png -m <模型目录> --precision

# 性能评测
python3 inference.py -i test_label.png -m <模型目录> --mode cpu --benchmark
python3 inference.py -i test_label.png -m <模型目录> --mode npu --benchmark

精度评测脚本

# 完整评测 (数值精度 + 文本一致性)
python3 eval_precision.py -m <模型目录> -i test_label.png

# 仅数值精度
python3 eval_precision.py -m <模型目录> --numerical-only --num-cases 10

# 仅文本一致性
python3 eval_precision.py -m <模型目录> -i test_label.png --text-only

性能评测脚本

# 端到端性能对比 (CPU + NPU)
python3 eval_performance.py -m <模型目录> -i test_label.png --mode all --runs 20

# 仅检测模型
python3 eval_performance.py -m <模型目录> --level detection --mode all

# 仅识别模型
python3 eval_performance.py -m <模型目录> --level recognition --mode all

注意事项

模型领域限制: small/large 识别模型在特定退货退款模板文本上训练, 对通用文本的识别质量有限。如需通用 OCR, 建议使用 EasyOCR 或 PaddleOCR。
Small 模型 batch 限制: small 版本识别模型输入形状固定为 [3, 3, 32, 300], 推理时不足 3 行的会用空图像填充。
检测模型尺寸: 根据图像大小选择合适的检测模型:
- 800x800: 通用 (推荐)
- 1600x1600: 较大图像/高精度需求
- 2400x2400: 超大图像 (仅 large 版本)
NPU 首次转换: onnx2torch 首次转换约需 10-30s, 后续推理正常。
精度: FP32 下 CPU 与 NPU 的模型输出误差 < 1e-4, ArgMax 一致率 100%。

License

本项目基于 Apache 2.0 License 开源。模型版权归原作者所有。