en_PP-OCRv4_rec_infer - AtomGit AI社区

en_PP-OCRv4_rec_infer — 昇腾 NPU 适配

基于华为昇腾 NPU 的 PP-OCRv4 英文文本识别模型推理适配。

原始模型: somohk/en_PP-OCRv4_rec_infer
任务: OCR 文本识别 (英文)
框架: PyTorch + torch_npu (昇腾 CANN 8.5.1)
精度: CPU 与 NPU 输出误差 < 0.001% ✅
性能: NPU 推理速度是 CPU 的 3.50 倍

环境要求

组件	版本
Python	3.11
PyTorch	2.9.0
torch_npu	2.9.0.post1
CANN	8.5.1
NPU	Ascend 910_9362 (×2)
Platform	Linux aarch64

# 验证 NPU 可用
python3 -c "import torch; import torch_npu; print(torch.npu.device_count())"

快速开始

安装依赖

pip install modelscope pillow numpy torch torch_npu

下载模型

modelscope download --model somohk/en_PP-OCRv4_rec_infer

运行推理

# NPU 推理
python3 inference.py --image /path/to/text_image.jpg

# CPU 推理 (基线对比)
python3 inference.py --image /path/to/text_image.jpg --cpu

# 精度评测
python3 inference.py --eval

# 性能跑分
python3 inference.py --benchmark        # NPU
python3 inference.py --benchmark --cpu  # CPU

模型架构

本实现基于 PaddleOCR PP-OCRv4_rec 模型重构为 PyTorch 版本，保留等价计算逻辑：

Input [B, 3, 48, W]
  │
  ▼
Backbone (SVTR/LCNetV3)
  ├─ conv1: Conv2d(3→16, k=3, s=2) + BN + HardSwish
  ├─ Transition Layers: 1×1 Conv 逐步扩展通道
  ├─ Stage2: 1× SVTR Block (32ch, /2)
  ├─ Stage3: 2× SVTR Block (64ch, /2)
  ├─ Stage4: 2× SVTR Block (96ch, /2)
  ├─ Stage5: 5× SVTR Block (128ch)
  └─ Stage6: 4× SVTR Block + SE (192ch, /2)
  │
  ▼
CTC Encoder
  ├─ ConvBN(192→120) × 5
  └─ SVTR DWConv Block
  │
  ▼
CTC Head
  ├─ AdaptiveAvgPool2d (H→1)
  └─ Linear(120→96) → CTC Logits
  │
  ▼
Output [B, W, 96]  → Greedy CTC Decode → Text

参数量: 12,728,384

性能评测

测试环境: Ascend 910_9362, CANN 8.5.1, PyTorch 2.9.0

输入规格: [1, 3, 48, 320], 100 次推理取平均

指标	CPU	NPU	加速比
平均延迟	68.48 ms	19.54 ms	3.50×
P99 延迟	69.81 ms	19.88 ms	3.51×
吞吐量	14.60 img/s	51.18 img/s	3.50×

精度评测

CPU vs NPU 一致性

随机输入 5 组，计算 CPU 与 NPU 输出的最大相对误差：

最大误差: 4.77 × 10⁻⁵ %  (远小于 1% 阈值)
状态: ✅ PASS (CPU-NPU 输出一致)

输出校验

输出形状：(1, 10, 96) ✅
输出范围：[-0.40, +0.31] ✅（合理 logits 范围）
CTC 解码：行为正常 ✅

文件说明

en_PP-OCRv4_rec_infer_npu/
├── inference.py          # 推理脚本 (NPU/CPU)
├── paddle_parser.py      # PaddlePaddle 模型解析器
├── README.md             # 本文档
└── eval/                 # 评测材料
    ├── eval_report.txt   # 评测报告
    └── benchmark_log.txt # 性能跑分日志

评测自验证截图

运行 python3 inference.py --eval && python3 inference.py --benchmark

截图内容应包含:

cpu_npu_max_error_pct: 4.768945087409517e-05 (精度一致)
cpu_npu_consistent: True
NPU 平均延迟 19.27ms, CPU 平均延迟 70.58ms

环境变量 (可选调优)

# NPU 亲和性绑定
export CPU_AFFINITY_CONF=0,1,2,3

# 内存分配器优化
export LD_PRELOAD=/usr/local/Ascend/cann-8.5.1/lib64/libtcmalloc.so

# NPU 设备
export ASCEND_DEVICE_ID=0

模型部署

本模型可用于昇腾 NPU 上的 OCR 文本识别 pipeline:

文字检测 (参考 PaddleOCR Det 模型适配)
文字识别 (本模型)
结果后处理 (CTC Greedy Decode)

典型部署架构:

图像 → 文字检测模型 (NPU) → 文字区域裁剪 → 本识别模型 (NPU) → 识别结果

许可证

Apache License 2.0（与原始模型相同）

贡献

适配工作：torch_npu 算子替换 + PyTorch 模型重构

tags: #NPU #Ascend #ocr-recognition #text-recognition #PP-OCRv4