OFA-OCR-Recognition (Ascend NPU Adaptation)

基于华为昇腾 Ascend 910 NPU 的 OFA 中文文字识别模型适配与评测。

模型介绍

OFA（One-For-All）是阿里达摩院提出的通用多模态预训练模型，使用统一的序列到序列学习框架统一了图像、文本等多模态任务。ofa_ocr-recognition_web_base_zh 是 OFA 模型在中文文字识别（OCR）任务上的官方权重，在多个公开数据集（RCTW, ReCTS, LSVT, ArT, CTW）上达到 SOTA 精度。

模型来源: ModelScope - iic/ofa_ocr-recognition_web_base_zh
论文: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
官方仓库: OFA-Sys/OFA
架构: Encoder-Decoder (ResNet101 + Transformer)
参数量: ~140M
语言: 中文

适配说明

本仓库提供 OFA OCR Recognition 模型在华为昇腾 Ascend 910 NPU 上的推理适配。

环境要求

组件	版本
CANN	8.5.1
PyTorch	2.9.0
torch_npu	2.9.0.post1
transformers	4.57.6
modelscope	1.35.3
Python	3.11

环境检测

# 检查 NPU 设备
npu-smi info

# 检查 PyTorch NPU 支持
python3 -c "import torch; import torch_npu; print(torch.npu.is_available())"

模型下载

pip install modelscope
modelscope download --model iic/ofa_ocr-recognition_web_base_zh

快速开始

# 单张图片推理
python3 inference.py --image path/to/image.jpg

# 使用在线图片
python3 inference.py --image https://example.com/ocr_image.png

# 批量基准测试
python3 inference.py --benchmark

# CPU 推理（精度对比）
python3 inference.py --benchmark --cpu

Python API 调用

from inference import load_model, ocr_inference

# 加载模型
model, tokenizer, generator, config = load_model()

# 推理
text, inference_time = ocr_inference(model, tokenizer, generator, 'image.jpg')
print(f"识别结果: {text}")
print(f"推理耗时: {inference_time:.3f}s")

精度评测

评测方法

将 NPU 推理输出与 CPU 参考输出进行逐字符精度对比，通过率为所有测试样本完全匹配的比例。

评测结果

测试图片	NPU 输出	CPU 输出	精度	耗时 (NPU)
image_ocr_recognition.jpg	欢迎光临	欢迎光临	100%	0.125s
ocr_web_demo.png	温馨提示	温馨提示	100%	0.089s
ocr_web.png	店主推荐正品代购100%专柜正品	店主推荐正品代购100%专柜正品	100%	0.223s
ocr_scene.png	大排档电话:58292825	大排档电话:58292825	100%	0.145s
ocr_web_demo.png (remote)	温馨提示	温馨提示	100%	0.088s

精度总结

指标	值
字符级精度	100.00%
样本通过率	5/5 (100%)
NPU/CPU 一致性	完全一致 (误差 < 1%)

性能评测

测试环境

硬件: Huawei Ascend 910 NPU x2 (单卡推理)
CANN: 8.5.1
CPU 参考: ARM aarch64

性能对比

指标	NPU (Ascend 910)	CPU (aarch64)	加速比
平均推理时间	0.1339 s/image	6.6547 s/image	49.7x
吞吐量	7.5 images/s	0.15 images/s	49.7x
首次推理 (含 warmup)	0.587 s	6.396 s	10.9x

各图片推理耗时

NPU:
  image_ocr_recognition.jpg  ▏ 0.125s
  ocr_web_demo.png           ▏ 0.089s
  ocr_web.png                ▏ 0.223s
  ocr_scene.png              ▏ 0.145s
CPU:
  image_ocr_recognition.jpg  ██████████████████████████████████████████████████████ 6.534s
  ocr_web_demo.png           ██████████████████████████████████████████████████ 5.646s
  ocr_web.png                ██████████████████████████████████████████████████████████████████ 8.491s
  ocr_scene.png              ████████████████████████████████████████████████████████████ 7.066s

运行评测

# 完整精度与性能评测（NPU + CPU 对比）
python3 evaluate.py

评测脚本自动完成：

NPU 推理（含 warmup）
CPU 参考推理
逐字符精度对比
性能统计与加速比计算
输出 JSON 评测报告

交付件清单

文件	说明
`inference.py`	NPU 推理脚本（支持单图/批量/benchmark 模式）
`evaluate.py`	精度与性能评测脚本
`evaluation_report.json`	评测结果（JSON 格式）
`README.md`	本部署文档

模型局限性

训练数据集自身存在偏向性，在特定场景（手写体、低分辨率、艺术字体）下识别精度可能下降。本适配仅验证推理阶段的精度与性能，未包含微调/训练适配。

引用

@article{wang2022ofa,
  author    = {Peng Wang and An Yang and Rui Men and Junyang Lin and
               Shuai Bai and Zhikang Li and Jianxin Ma and Chang Zhou and
               Jingren Zhou and Hongxia Yang},
  title     = {OFA: Unifying Architectures, Tasks, and Modalities Through
               a Simple Sequence-to-Sequence Learning Framework},
  journal   = {CoRR},
  volume    = {abs/2202.03052},
  year      = {2022}
}

License

Apache License 2.0