1、模型概述

OCR（光学字符识别，Optical Character Recognition）是一种将图像中的文字转换为可编辑文本的技术。它广泛应用于文档数字化、信息提取和数据处理等领域。OCR 可以识别印刷文本、手写文本，甚至某些类型的字体和符号。

通用 OCR 产线用于解决文字识别任务，提取图片中的文字信息以文本形式输出，PaddleOCR3.0 发布的 PP-OCRv5_server 模型，其在多个场景中较 PP-OCRv4_server 提升 13 个百分点。

通用OCR产线中包含以下5个模块。每个模块均可独立进行训练和推理。

2、环境准备

2.1、环境版本

配套	版本
设备型号	Atlas 800 3000（300I Duo卡）
CANN	8.0.0

2.2、拉取镜像

拉取Paddle官方镜像，镜像中已经默认安装了昇腾算子库

docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC2-ubuntu20-npu-base-aarch64-gcc84

2.3、安装Paddle相关包

安装paddle框架及昇腾插件，这里安装3.2版本，已支持OCRv5模型

pip install paddlepaddle==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
pip install paddle-custom-npu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/npu/

CANN-8.0.0 对 numpy 和 opencv 部分版本不支持，需安装指定版本

python -m pip install numpy==1.26.4
python -m pip install opencv-python==3.4.18.65

安装完成后可以检查当前安装版本，预期可以得到如下结果

python -c "import paddle_custom_device; paddle_custom_device.npu.version()"

paddle版本检查截图.png

安装paddleocr

pip install "paddleocr[all]"

解决 libgomp 在 arm 机器上报错

# "libgomp cannot allocate memory in static TLS block"
export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1:$LD_PRELOAD

使用PaddleX命令安装npu高性能推理插件

paddlex --install hpi-npu

更新paddle2onnx到最新版本，否则转onnx模型的时候会有算子不支持的问题

pip install paddle2onnx --upgrade

3、推理服务部署

3.1、启动容器

容器启动命令示例如下

docker run -itd --name paddle-npu-test \
    --privileged --network=host --shm-size=128G -w=/work \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -e ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"\
    ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC2-ubuntu20-npu-base-aarch64-gcc84 /bin/bash

3.2、模型下载

OCRv5后端共涉及5个模型，需要准备好相关模型文件

3.2.1、方法一

直接使用cpu运行OCRv5模型，会将相关模型下载至默认目录运行命令如下，将几个可选参数都设置为True，这样相关的模型会一次都下载了

paddleocr ocr -i ./general_ocr_002.png --use_doc_orientation_classify True --use_doc_unwarping True --use_textline_orientation True

日志中会打印出模型下载目录

模型下载目录.png

3.2.2、方法二

可以通过如下链接下载OCRv5相关模型

# 文本检测模型
wget https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv5_server_det_infer.tar

# 文本识别模型
wget https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv5_server_rec_infer.tar

# 文本方向分类模型
wget https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-LCNet_x1_0_doc_ori_infer.tar

# 文本行方向分类模型
wget https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-LCNet_x1_0_textline_ori_infer.tar

# 文本图像预处理模型
wget https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/UVDoc_infer.tar

3.3、模型转换

3.3.1、模型转换为onnx

这里以PP-OCRv5_server_det模型为例，使用PaddleX提供的paddle2onnx插件将原始模型转成onnx模型

paddlex --paddle2onnx --paddle_model_dir /root/.paddlex/official_models/PP-OCRv5_server_det --onnx_model_dir /work/OCRv5-pipeline/PP-OCRv5_server_det_infer --opset_version 11

转换完成后，目标目录下会生成inference.onnx与inference.yml文件

paddle转onnx格式.png

3.3.2、模型转换为om

这里以PP-OCRv5_server_det模型为例，使用atc工具将onnx模型转为om模型其中soc_version参数指定芯片型号，由于paddle框架暂未支持310P系列芯片的动态shape，目前只能使用固定shape进行推理，atc转换时需要指定input_shape

atc --model=/work/OCRv5-pipeline/PP-OCRv5_server_det_infer/inference.onnx --framework=5 --output=/work/OCRv5-pipeline/PP-OCRv5_server_det_infer/inference --soc_version=Ascend310P3 --input_shape="x:1,3,736,960"

onnx转om.png

附：如何查询模型的input_shape

使用Netron工具打开onnx模型，查看输入shape，以如下模型为例，其中：

x：input_name
p2o.DynamicDimension.0：批次大小，对应NCHW格式中B，可设置动态
3：通道数 (排列方式BGR) ，对应NCHW格式中C，为固定值
p2o.DynamicDimension.1：图像高度，对应NCHW格式中H，可设置动态
p2o.DynamicDimension.2：图像宽度，对应NCHW格式中W，可设置动态

3.4、单模型推理

此时模型转换完成，已经可以进行单模型推理

注：OCR模型通过input_shape传入模型输入shape，且需要与atc转换时指定的shape一致

from paddlex import create_model

hpi_config = {
    "auto_config": False,	# 关闭自动配置功能，手动配置后端
    "backend": "om",	# 选用om后端
}

# model_name传入使用的模型名称，
# model_dir传入模型及配置文件存放的路径
# device设置为"npu:0"或"npu"，不设置卡号则默认使用0号卡
# use_hpip设置为True，开启高性能推理插件
# input_shape传入模型输入shape，以列表形式传入[c,w,h]，需要和atc转换时指定的输入shape保持一致，且目前只有OCR类的模型需要传入该参数
model = create_model(model_name="PP-OCRv5_server_det", model_dir="/work/OCRv5-pipeline/PP-OCRv5_server_det_infer_om", device="npu:0", use_hpip=True, hpi_config=hpi_config, input_shape=[3, 736, 960])
output = model.predict("./images/general_ocr_001.png")

for res in output:
    res.print(json_format=False)
    res.save_to_img("./output/")
    res.save_to_json("./output/res.json")

单模型执行推理结果

单模型推理执行.png

3.5、产线推理

因为PaddleOCR后端有多个模型，需要组合使用多个模型，Paddle高性能推理插件支持PaddleX后端产线推理

以OCRv5涉及的模型为例，将上述涉及的5个模型提前完成onnx和om转换，目录结构如下

3.5.1、全om后端配置

全om后端的产线配置文件OCR.yml如下

# 在顶层设置hpi_config,指定推理后端为om
# 禁用om暂不支持的模块，主要保留检测和识别模块
# 在检测和识别模块中配置参数input_shape，设置静态shape
# 在检测和识别模块中配置参数model_dir，指向模型文件及配置文件的路径

pipeline_name: OCR

text_type: general

use_doc_preprocessor: False
use_textline_orientation: False

hpi_config:
  auto_config: False
  backend: om

SubPipelines:
  DocPreprocessor:
    pipeline_name: doc_preprocessor
    use_doc_orientation_classify: False
    use_doc_unwarping: False
    SubModules:
      DocOrientationClassify:
        module_name: doc_text_orientation
        model_name: PP-LCNet_x1_0_doc_ori
        model_dir: /work/OCRv5-pipeline/PP-LCNet_x1_0_doc_ori_om
      DocUnwarping:
        module_name: image_unwarping
        model_name: UVDoc
        model_dir: /work/OCRv5-pipeline/UVDoc_om

SubModules:
  TextDetection:
    module_name: text_detection
    model_name: PP-OCRv5_server_det
    model_dir: /work/OCRv5-pipeline/PP-OCRv5_server_det_infer_om
    limit_side_len: 960
    limit_type: max
    max_side_limit: 4000
    thresh: 0.3
    box_thresh: 0.6
    unclip_ratio: 1.5
    input_shape: [3, 640, 480]
  TextLineOrientation:
    module_name: textline_orientation
    model_name: PP-LCNet_x1_0_textline_ori
    model_dir: /work/OCRv5-pipeline/PP-LCNet_x1_0_textline_ori_om
    batch_size: 1
  TextRecognition:
    module_name: text_recognition
    model_name: PP-OCRv5_server_rec
    model_dir: /work/OCRv5-pipeline/PP-OCRv5_server_rec_infer_om
    batch_size: 1
    score_thresh: 0.0
    input_shape: [3, 48, 320]

3.5.2、om+onnxruntime后端配置

这里有一个问题，因为paddlex框架暂未支持OM推理动态shape，所以不同尺寸的图片的推理精度会受影响，规避此问题可以将动态shape模型推理后端设置为onnxruntime（即使用cpu运行相关onnx模型，这样会有一定的性能影响）

om+onnxruntime后端的产线配置文件OCR.yml如下，其中hpi_config-backend/device可以指定后端模型格式及推理设备

pipeline_name: OCR

text_type: general

use_doc_preprocessor: True
use_textline_orientation: True

SubPipelines:
  DocPreprocessor:
    pipeline_name: doc_preprocessor
    use_doc_orientation_classify: True
    use_doc_unwarping: True
    SubModules:
      DocOrientationClassify:
        module_name: doc_text_orientation
        model_name: PP-LCNet_x1_0_doc_ori
        model_dir: /work/OCRv5-pipeline/PP-LCNet_x1_0_doc_ori_om
        hpi_config:
          auto_config: False
          backend: om
          device_type: npu
      DocUnwarping:
        module_name: image_unwarping
        model_name: UVDoc
        model_dir: /work/OCRv5-pipeline/UVDoc_om
        hpi_config:
          auto_config: False
          backend: onnxruntime
          device_type: cpu


SubModules:
  TextDetection:
    module_name: text_detection
    model_name: PP-OCRv5_server_det
    model_dir: /work/OCRv5-pipeline/PP-OCRv5_server_det_infer_om
    limit_side_len: 960
    limit_type: max
    max_side_limit: 4000
    thresh: 0.3
    box_thresh: 0.6
    unclip_ratio: 1.5
    hpi_config:
      auto_config: False
      backend: onnxruntime
      device_type: cpu
  TextLineOrientation:
    module_name: textline_orientation
    model_name: PP-LCNet_x1_0_textline_ori
    model_dir: /work/OCRv5-pipeline/PP-LCNet_x1_0_textline_ori_om
    batch_size: 1
    hpi_config:
      auto_config: False
      backend: om
      device_type: npu
  TextRecognition:
    module_name: text_recognition
    model_name: PP-OCRv5_server_rec
    model_dir: /work/OCRv5-pipeline/PP-OCRv5_server_rec_infer_om
    batch_size: 1
    score_thresh: 0.0
    hpi_config:
      auto_config: False
      backend: onnxruntime
      device_type: cpu

3.5.3、产线推理执行

产线推理脚本如下

from paddlex import create_pipeline

# pipeline设置为产线配置文件，use_hpip表示使用高性能推理
pipeline = create_pipeline(pipeline="./OCRv5.yaml", use_hpip=True)

output = pipeline.predict(
    input="./images/general_ocr_002.png",
    use_doc_orientation_classify=True,
    use_doc_unwarping=True,
    use_textline_orientation=True,
)

for res in output:
    res.print()
    res.save_to_img("./output/")
    res.save_to_json("./output/")

产线推理结果输出

产线推理结果.png

模型推理图片效果（注：测试图片来源于paddle官方测试集，下载链接）

模型推理图片效果.png