读光 OCR ONNX - 昇腾 NPU 适配

读光OCR是一款功能强大的光学字符识别模型，支持中文、英文识别。本项目基于原始ONNX模型完成昇腾NPU适配，支持在华为Ascend 910 NPU上进行高性能推理。

适配概述

项目	详情
原始模型	mscoder/duguang-ocr-onnx-v2
适配平台	华为 Ascend 910 NPU (Atlas 800 A2)
CANN 版本	8.5.1 (V100R001C25SPC002B220)
NPU 驱动	25.5.2
适配方法	ATC ONNX-to-OM 离线模型转换 + ACL Python 推理
精度表现	NPU vs CPU 相对误差 < 0.1% (全部通过 <1% 阈值)
性能表现	检测模型 100.02x 加速, 识别模型 26.98x 加速

模型架构

读光OCR模型包含两个子模型：

文字检测模型 (Detection) - 基于 SegLink++ / ResNet18 架构，支持多尺度特征检测
- 输入: [batch, 512, 512, 3] (NHWC)
- 输出: 6 尺度 × 3 头 (classification, link, regression)
- 支持分辨率: 512×512, 1024×1024, 1600×1600
文字识别模型 (Recognition) - 基于 CRNN 架构
- 输入: [batch, 3, 32, 300] (NCHW)
- 输出: [1, seq_len, 7644] (CTC logits)
- 词汇表: 7644 个中英文字符

部署指南

环境要求

# 系统环境
OS: Linux (openEuler / CentOS / Ubuntu)
Arch: aarch64
CANN: >= 8.5.0
NPU: Ascend 910 系列

# Python 依赖
pip install onnx onnxruntime modelscope numpy opencv-python pillow

模型下载与转换

# 1. 下载原始模型
modelscope download --model mscoder/duguang-ocr-onnx-v2 --local_dir ./duguang-ocr-onnx-v2

# 2. 准备模型文件 (修复 ONNX opset 兼容性)
python3 -c "
import onnx
model = onnx.load('models/detection_base_512x512.onnx')
# 移除 ai.onnx.ml 域 (ATC 兼容性)
new_imports = [o for o in model.opset_import if o.domain != 'ai.onnx.ml']
while len(model.opset_import) > 0:
    model.opset_import.pop()
for opset in new_imports:
    model.opset_import.append(opset)
onnx.save(model, 'models/detection_base_512x512_atc.onnx')
"

# 3. ATC 模型转换 (ONNX -> OM)
# 检测模型
atc --model=models/detection_base_512x512_atc.onnx \
    --framework=5 \
    --output=npu_models/detection_base_512x512 \
    --soc_version=Ascend910_9362 \
    --input_format=ND \
    --input_shape="input_images:0:1,512,512,3"

# 识别模型 (需使用 batch_size=3, width=300)
atc --model=models/recognition_base.onnx \
    --framework=5 \
    --output=npu_models/recognition_base \
    --soc_version=Ascend910_9362 \
    --input_format=ND \
    --input_shape="input_images:3,3,32,300"

推理运行

# 精度评测 (NPU vs CPU)
python3 inference.py --mode precision

# 性能评测
python3 inference.py --mode performance

# 全流水线演示
python3 inference.py --mode all

精度评测结果

检测模型 (Detection)

NPU (Ascend 910) 与 CPU (ONNX Runtime) 输出精度对比：

输出层	Shape	最大绝对误差	平均绝对误差	相对误差	判定
Output[0]	(1, 128, 128, 2)	0.009723	0.001148	0.0164%	PASS
Output[1]	(1, 128, 128, 32)	0.017229	0.001061	0.0059%	PASS
Output[2]	(1, 128, 128, 6)	0.032419	0.005643	0.0571%	PASS
Output[3]	(1, 64, 64, 2)	0.010605	0.000946	0.0096%	PASS
Output[4]	(1, 64, 64, 48)	0.031164	0.001366	0.0058%	PASS
Output[5]	(1, 64, 64, 6)	0.052956	0.001664	0.0065%	PASS
Output[6]	(1, 32, 32, 2)	0.005631	0.001136	0.0226%	PASS
Output[7]	(1, 32, 32, 48)	0.019940	0.001777	0.0142%	PASS
Output[8]	(1, 32, 32, 6)	0.035810	0.002291	0.0265%	PASS
Output[9]	(1, 16, 16, 2)	0.004137	0.000991	0.0282%	PASS
Output[10]	(1, 16, 16, 48)	0.012721	0.001661	0.0246%	PASS
Output[11]	(1, 16, 16, 6)	0.032527	0.003268	0.0399%	PASS
Output[12]	(1, 8, 8, 2)	0.007587	0.001942	0.0470%	PASS
Output[13]	(1, 8, 8, 48)	0.013199	0.002385	0.0351%	PASS
Output[14]	(1, 8, 8, 6)	0.091441	0.008367	0.0963%	PASS
Output[15]	(1, 4, 4, 2)	0.005012	0.002173	0.0547%	PASS
Output[16]	(1, 4, 4, 48)	0.011495	0.002711	0.0417%	PASS
Output[17]	(1, 4, 4, 6)	0.062672	0.008204	0.0977%	PASS

识别模型 (Recognition)

输出层	Shape	最大绝对误差	平均绝对误差	相对误差	判定
Output[0]	(1, 201, 7644)	0.046191	0.003555	0.0287%	PASS

精度总结

✅ 所有 19 个输出层精度测试全部通过 (相对误差 < 1%)

检测模型: 18/18 通过
识别模型: 1/1 通过
最大相对误差: 0.0977% (远低于 1% 阈值)

性能评测结果

测试环境

配置项	CPU 环境	NPU 环境
推理框架	ONNX Runtime 1.26.0	CANN ACL 8.5.1
硬件	Kunpeng 920	Ascend 910
精度	FP32	FP32
测试轮次	50 (预热 5)	50 (预热 5)

推理延迟

模型	CPU 延迟 (ms)	NPU 延迟 (ms)	加速比
检测模型 (1×512×512×3)	391.94	3.92	100.02x
识别模型 (3×3×32×300)	215.86	8.00	26.98x
全流水线 (端到端)	~608	~13	~48x

性能分析

检测模型: NPU 推理仅需 3.92ms，相比CPU加速 100.02 倍，主要受益于 NPU 对卷积神经网络的高效并行计算能力
识别模型: NPU 推理仅需 8.00ms，加速 26.98 倍，CRNN 中的 LSTM/FC 层在 NPU 上得到显著优化
端到端流水线: 整体推理延迟 < 13ms，满足实时 OCR 场景需求

适配遇到的关键问题与解决方案

问题 1: ATC ONNX 路径校验

现象: ATC 不允许路径中包含 + 字符

解决: 将模型文件复制到不含特殊字符的路径下

问题 2: ONNX opset domain 冲突

现象: The model has 2 --domain_version fields, but only one is allowed

解决: 移除 ai.onnx.ml domain 的 opset import，仅保留默认 domain

问题 3: SOC Version 识别

现象: 常见 SOC version 名称无法通过 ATC 校验

解决: 使用 torch_npu 获取设备名称 Ascend910_9362，此名称在 CANN 8.5.1 中有效

问题 4: 识别模型动态 Shape

现象: 识别模型仅支持 batch_size=3, width=300 的输入组合

解决: 固定输入 shape 为 [3, 3, 32, 300]，在预处理阶段对输入进行 padding/截断

问题 5: ACL Python API 返回值

现象: ACL Python bindings 中多个函数返回 tuple (handle, ret_code) 而非单一 ret code

解决: 实现 _rt_ok() 和 _rt_code() 辅助方法统一处理

问题 6: ACL 多实例共享初始化

现象: 同时创建多个 NPUInference 实例时 acl.init() 报错 100002 (重复初始化)

解决: 实现类级别 _acl_initialized 标志，仅首个实例执行 acl.init()，末个实例执行 acl.finalize()

文件清单

.
├── inference.py              # NPU/CPU 推理脚本
├── README.md                 # 本文档
├── models/                   # ONNX 模型文件
│   ├── detection_base_512x512.onnx
│   ├── detection_base_512x512_atc.onnx  # ATC 兼容修复版
│   ├── recognition_base_orig.onnx
│   ├── recognition_vocab.txt
│   └── configuration.json
├── npu_models/               # ATC 转换后的 OM 模型
│   ├── detection_base_512x512.om
│   └── recognition_base.om
└── eval_output/              # 评测结果
    ├── precision_results.json
    ├── performance_results.json
    └── test_input.png

原始模型关联

模型	模型原始仓库
base_seglink++ (检测)	cv_resnet18_ocr-detection-line-level_damo
base_seglink++ (识别)	cv_convnextTiny_ocr-recognition-general_damo

参考

适配完成日期: 2026-05-20 适配工具链: CANN 8.5.1 + ATC + ACL Python API 适配验证: 精度 PASS (相对误差 <1%), 性能加速 27-100x