RapidOCR NPU Adaptation (昇腾NPU适配)

RapidOCR 在华为昇腾 NPU 上的适配推理方案，基于 onnx2torch 模型转换 + torch_npu 后端，支持 PP-OCRv5 文本检测（DBNet）与文本识别（SVTR/CRNN）流水线。

模型简介

原始仓库: RapidAI/RapidOCR
模型框架: PyTorch (via ONNX → torch 转换)
推理任务: OCR 文字检测 + 识别
检测模型: PP-OCRv5 DBNet (MobileNetV3 backbone + FPN neck + DB head)
识别模型: PP-OCRv5 SVTR/CRNN (MobileNetV3 backbone + CTC head)
字符集: 18383 个中文字符 (含标点符号、英文字母、数字)

环境要求

组件	版本
CANN	8.5.1
PyTorch	2.9.0
torch_npu	2.9.0.post1
onnx2torch	1.5.15
onnx	1.21.0
opencv-python-headless	4.11.0
NPU Hardware	Ascend910_9362 × 2

快速开始

1. 下载模型

pip install modelscope
modelscope download --model RapidAI/RapidOCR

2. 安装依赖

pip install onnx2torch onnx opencv-python-headless pillow numpy

3. 运行推理

# NPU 推理
python3 inference.py --image test.jpg --device npu

# CPU 推理 (基准对比)
python3 inference.py --image test.jpg --device cpu

# NPU vs CPU 精度对比
python3 inference.py --image test.jpg --compare

# 性能基准测试
python3 inference.py --image test.jpg --device npu --benchmark

适配方案

技术路线

实现要点

模型转换: 使用 onnx2torch.convert() 将 RapidOCR 发布的 ONNX 模型自动转换为 PyTorch 计算图
NPU 后端: 通过 .to(torch.device("npu:0")) 将模型部署到昇腾 NPU
预处理: OpenCV 图像预处理（resize、normalize、padding），保持与 PaddleOCR 一致的数值精度
后处理:
- 检测: 概率图阈值化 → 轮廓提取 → 最小外接矩形
- 识别: CTC greedy decode → 字符映射表解码

关键代码

import onnx2torch
import torch

# 加载 ONNX 模型并转换为 PyTorch
det_model = onnx2torch.convert("ch_PP-OCRv5_det_mobile.onnx")
rec_model = onnx2torch.convert("ch_PP-OCRv5_rec_mobile.onnx")

# 部署到 NPU
device = torch.device("npu:0")
det_model.eval().to(device)
rec_model.eval().to(device)

# 推理
with torch.no_grad():
    prob_map = det_model(det_input.to(device))
    ctc_logits = rec_model(rec_input.to(device))

精度评测

评测方法

对比 NPU 与 CPU 推理输出，采用 IoU 匹配框 + 字符级精确匹配。

指标	数值
检测输出 max diff (绝对值)	0.026408
检测输出 mean diff (绝对值)	0.000036
框匹配数 (IoU ≥ 0.5)	20/20
文本级匹配率	100.0%
字符级匹配率	100.00%
字符级误差	0.00%
精度门限 (<1% error)	✅ 通过

评测截图

ch_en_num.jpg - NPU vs CPU 对比

NPU detected 21 boxes:
  [0.9826] 假一倍十
  [0.9790] 抗点吧
  [0.9843] 冰点标准
  [0.9703] 极速发店
  [0.9804] 京喜福利不容错设
  [0.9846] ca8b
  [0.9773] 100
  [0.9766] 100
  [0.9842] $力天运准会国
  [0.9753] 大福集車划套
  [0.9946] F品促E

CPU detected 20 boxes:
  [0.9826] 假一倍十
  [0.9792] 抗点吧
  [0.9845] 冰点标准
  [0.9700] 极速发店
  [0.9806] 京喜福利不容错设
  [0.9842] ca8b
  [0.9770] 100
  [0.9766] 100
  [0.9843] $力天运准会国
  [0.9751] 大福集車划套
  [0.9945] F品促E

Text-level match rate (IoU matched): 20/20 = 100.0%
Char-level match rate: 67/67 = 100.00%
Char-level error: 0.00%

性能评测

测试环境

NPU: Ascend910_9362 × 2 (CANN 8.5.1)
CPU: ARM aarch64
测试图片: ch_en_num.jpg (300×400, 中英文混合)
预热: 5 runs, 基准: 20 runs

延迟

阶段	NPU (ms)	CPU (ms)	加速比
检测 (DBNet)	20.35	125.70	6.18×
识别 (SVTR/CRNN) per box	20.81	63.73	3.06×
端到端 (21 boxes)	551	1424	2.58×

吞吐量

指标	NPU	CPU
检测吞吐	49.1 images/s	8.0 images/s
识别吞吐	48.1 crops/s	15.7 crops/s

文件结构

RapidOCR_NPU/
├── inference.py          # NPU 推理主脚本
├── eval.sh               # 精度+性能自动化评测脚本
├── eval_output/           # 评测输出目录
│   ├── eval_results.log  # 评测运行日志
│   ├── ch_en_num_npu.jpg # 标注结果可视化
│   ├── long_npu.jpg
│   └── test_npu.jpg
└── README.md             # 本文档

已知限制

方向分类器 (text direction classifier) 未集成，对旋转文本可能需额外处理
当前仅测试 PP-OCRv5 中文模型，多语种模型按需添加
ONNX → PyTorch 转换的 Slice 算子在 PyTorch 2.9 有兼容性警告，不影响功能

引用

@misc{rapidocr,
  author = {SWHL},
  title = {RapidOCR: Awesome OCR toolkits based on PyTorch / ONNX},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/RapidAI/RapidOCR}
}

该模型在华为昇腾 NPU (Ascend910B / CANN 8.5.1) 上验证，精度误差 < 1%，推理加速 2.6× ~ 6.2×。