gcw_GSiqzzLf/en_ppocr_mobile_v2.0_table_rec_infer-npu

en-ppocr-mobile-v2.0-table-rec-infer-NPU

模型介绍

en_ppocr_mobile_v2.0_table_rec_infer 是 PaddleOCR 提供的表格文本识别模型（Table Text Recognition），基于 MobileNetV3 骨干网络 + CTC 解码器的轻量级 OCR 识别模型。该模型专门用于识别表格单元格中的文本内容，支持英文数字、标点符号及常用数学符号的识别。

本仓库是基于昇腾 Ascend NPU 的适配版本，使用 ONNX Runtime CANN Execution Provider 在昇腾 910 芯片上进行加速推理。

原始模型地址

ModelScope: https://www.modelscope.cn/models/cycloneboy/en_ppocr_mobile_v2.0_table_rec_infer/files

任务类型

OCR 文本识别（Table Text Recognition）- 识别表格单元格中的文字内容

模型框架

架构: CRNN (CNN + BiLSTM + CTC)
骨干网络: MobileNetV3
序列建模: BiLSTM (DynamicRNN)
解码器: CTC (Connectionist Temporal Classification)
字符集: 279 类（blank + 277 个 table_dict 字符 + space）
输入尺寸: [N, 3, 32, W]（高度固定 32，宽度动态）
模型格式: ONNX (opset 14)
模型大小: 5.8 MB (FP32)

输入格式

图像: RGB 格式，高度归一化为 32 像素，宽度按比例缩放
输入张量形状: [batch_size, 3, 32, width]
归一化: (x / 255.0 - 0.5) / 0.5

输出格式

输出张量形状: [batch_size, T, 279]（T 为时间步数）
解码方式: CTC Greedy Decoding（argmax + 去重 + 去 blank）
输出文本: 识别的字符串

依赖环境

Python >= 3.10
onnxruntime >= 1.15.0
onnxruntime-cann >= 1.15.0（Ascend NPU 驱动）
numpy >= 1.20.0
Pillow >= 9.0.0

NPU 适配说明

本适配版本使用 ONNX Runtime 在昇腾 NPU 上进行推理：

CPU 推理: 使用 CPUExecutionProvider 加载 ONNX 模型进行 CPU 推理
NPU 推理: 使用 CANNExecutionProvider 加载同一 ONNX 模型在 Ascend 910 NPU 上推理
同一模型: CPU 和 NPU 使用完全相同的 ONNX 模型文件，确保对比公平
精度保障: 软最大化概率差异 < 0.02%，预测完全一致

环境准备

# 安装依赖
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple onnxruntime onnxruntime-cann numpy Pillow

推理命令

CPU 推理

python3 inference.py --image /path/to/image.png --device cpu

NPU 推理

python3 inference.py --image /path/to/image.png --device npu

推理结果

设备	测试图片	推理耗时
CPU	test_text_widget.png	0.0055s
NPU	test_text_widget.png	0.1929s*
CPU	test_text_qty.png	0.0039s
NPU	test_text_qty.png	0.0598s
CPU	test_text_hello.png	0.0069s
NPU	test_text_hello.png	0.0599s
CPU	test_table.png	0.0063s
NPU	test_table.png	0.0595s

*注：首次 NPU 推理包含算子编译时间（约 35s），后续推理为 0.06-0.2s。CPU 时间不包含模型加载。

部署和推理方法

核心推理代码

import onnxruntime
import numpy as np
from PIL import Image

# 1. 加载模型
session = onnxruntime.InferenceSession(
    "model.onnx",
    providers=["CANNExecutionProvider", "CPUExecutionProvider"]  # NPU
    # providers=["CPUExecutionProvider"]  # CPU
)

# 2. 图像预处理
img = Image.open("image.png").convert("RGB")
w, h = img.size
ratio = 32 / h
target_w = max(int(w * ratio), 2)
img = img.resize((target_w, 32), Image.LANCZOS)
img_np = np.array(img, dtype=np.float32) / 255.0
img_np = (img_np - 0.5) / 0.5
img_np = np.transpose(img_np, (2, 0, 1))
img_np = np.expand_dims(img_np, axis=0)

# 3. 推理
outputs = session.run(None, {"x": img_np})
logits = outputs[0]  # [1, T, 279]

# 4. CTC 解码
pred_idx = logits[0].argmax(axis=1)
prev = -1
result = []
for c in pred_idx:
    if c != prev and c != 0:  # 0 = blank
        result.append(char_list[c])
    prev = c
text = "".join(result)

CPU/NPU 精度测试方法

运行精度对比脚本：

python3 compare_cpu_npu.py

该脚本会依次：

对每张测试图片分别进行 CPU 和 NPU 推理
计算 CPU 和 NPU 输出之间的软最大化概率差异
计算 logit 余弦相似度
计算预测一致性（argmax 匹配率）
输出详细对比结果并保存到 compare_results.json

CPU/NPU 精度测试结果

测试图片	最大概率差异	平均概率差异	Logit 余弦相似度	预测一致性	结果
test_text_widget.png	0.0078%	0.00002%	0.99996	100.0000%	PASS
test_text_qty.png	0.0086%	0.00002%	0.99997	100.0000%	PASS
test_text_hello.png	0.0079%	0.00002%	0.99992	100.0000%	PASS
test_table.png	0.0113%	0.00003%	0.99995	100.0000%	PASS

精度测试结论：NPU 与 CPU 推理结果误差为 0.0113%，符合精度误差小于 1% 的要求。

性能测试结果

指标	CPU (FP32)	NPU (FP32)
推理耗时（平均）	0.0057s	0.0929s*
模型精度	float32	float32

*注：NPU 推理包含 CANN 执行开销，对于小模型 CPU 更有优势。NPU 优势在于大模型或批量推理场景。

终端输出截图

推理成功证据

本仓库提供完整的推理脚本，支持 CPU 和 NPU 双平台推理：

# NPU 推理
python3 inference.py --device npu

# CPU 推理
python3 inference.py --device cpu

推理完成后会输出推理结果和耗时，表明模型在 NPU 上推理成功。

模型标签

#+NPU #+CV #+OCR #+昇腾 #+PaddleOCR #+表格识别 #+文字识别 #+Ascend #+ONNX

版权说明

本仓库仅提供适配脚本和使用示例，模型权重版权归 PaddleOCR 团队所有。使用本模型请遵守 PaddleOCR 的开源许可协议（Apache 2.0）。