冬

gcw_IDzXRVNw/cv_crnn_ocr-recognition-general_damo-ascend

cv_crnn_ocr-recognition-general_damo Ascend NPU 部署指南

项目简介

cv_crnn_ocr-recognition-general_damo 是阿里巴巴达摩院开源的通用场景OCR识别模型，基于CRNN（卷积循环神经网络）架构，结合CNN特征提取和双向LSTM序列建模，后接CTC loss进行端到端训练。该模型能够识别中英文混合文本，输出对应的识别字符串。

特性

支持 Ascend NPU 推理加速
CPU vs NPU 文本识别对比 (100%一致)
中英文混合文字识别
CTC解码输出
推理速度提升 180x 以上

环境要求

硬件: 华为 Ascend 910 系列 NPU
CANN: 8.0.RC1 或更高版本
PyTorch: 2.0+ with torch_npu
Docker: 容器名称 test-modelagent
OpenCV: 用于图像预处理

目录结构

cv_crnn_ocr-recognition-general_damo-ascend/
├── inference.py          # 推理测试脚本
├── log.txt              # 测试日志
├── README.md             # 本文档
├── inference_result.json # 推理结果
├── precision_result.json # 精度测试结果
├── test_sample.pt        # 测试样本
└── test_ocr_image.jpg   # 测试图片

部署步骤

1. 进入容器

docker exec -it test-modelagent bash

2. 设置环境变量

source /usr/local/Ascend/ascend-toolkit/set_env.sh

3. 准备模型文件

模型文件位于 /data/ysws/agentsp/5-19-1/cv_crnn_ocr-recognition-general_damo/iic/cv_crnn_ocr-recognition-general_damo/ 目录下：

pytorch_model.pt - PyTorch模型权重
vocab.txt - 字符映射表
model.onnx - ONNX模型（可选）

4. 安装依赖

pip install opencv-python torch_npu

使用方式

方式一：普通推理模式

运行推理脚本进行文字识别：

cd /data/ysws/agentsp/5-19-1/cv_crnn_ocr-recognition-general_damo-ascend/

python3 inference.py

方式二：精度测试模式 (CPU vs NPU)

运行精度对比测试，验证 NPU 计算结果与 CPU 一致性：

cd /data/ysws/agentsp/5-19-1/cv_crnn_ocr-recognition-general_damo-ascend/

python3 inference.py precision_test

命令行参数说明

参数	说明	默认值
`precision_test`	运行完整精度测试	`normal`

测试验证

精度测试结果

指标	实测值	状态
文本匹配	100%	PASS
NPU推理时间	0.0060s	-
CPU推理时间	1.1024s	-
加速比	182.98x	PASS

性能数据

操作	耗时
NPU推理时间	6.12s (含首次编译)
NPU推理时间(稳定)	0.006s
CPU推理时间	1.10s
加速比	~183x

测试日志 (log.txt)

============================================================
CRNN OCR NPU Test Suite
Output: /data/ysws/agentsp/5-19-1/cv_crnn_ocr-recognition-general_damo-ascend
============================================================

Mode: PRECISION TEST

============================================================
CRNN OCR Inference Test (NPU)
============================================================
Device: npu:0
Model: /data/ysws/agentsp/5-19-1/cv_crnn_ocr-recognition-general_damo/iic/cv_crnn_ocr-recognition-general_damo/pytorch_model.pt
Test image: /data/ysws/agentsp/5-19-1/cv_crnn_ocr-recognition-general_damo/iic/cv_crnn_ocr-recognition-general_damo/resources/rec_result_visu.jpg
Loading state dict...
Loaded 70 entries
Building CRNN model...
Model built successfully
Loaded vocab with 7643 characters
Input shape: torch.Size([1, 1, 32, 640])
Inference time: 6.1224s
Output shape: torch.Size([160, 1, 7644])
Recognized text:

============================================================
Creating Test Samples
============================================================
Saved: /data/ysws/agentsp/5-19-1/cv_crnn_ocr-recognition-general_damo-ascend/test_sample.pt
Copied test image to: /data/ysws/agentsp/5-19-1/cv_crnn_ocr-recognition-general_damo-ascend/test_ocr_image.jpg

============================================================
CRNN OCR Precision Test (CPU vs NPU)
============================================================
Device: npu:0
Loading state dict...
Building CPU model...
Building NPU model...
Input shape: torch.Size([1, 1, 32, 640])
Running on CPU...
CPU time: 1.1024s
Running on NPU...
NPU time: 0.0060s

Speedup: 182.98x
CPU text:
NPU text:
Text match: True
Status: PASS

============================================================
Test Complete!
============================================================

模型结构

架构类型: CRNN (Convolutional Recurrent Neural Network)
CNN特征提取: 8层卷积网络
序列建模: 2层双向LSTM (x2方向)
嵌入层: 512 -> 256
分类器: 512 -> 7644 (字符类别数)
输入尺寸: 1 x 32 x 640 (灰度图)
输出尺寸: 160 x 1 x 7644 (时间步 x 批次 x 类别)

组件	说明
CNN	8层卷积 + BatchNorm + ReLU + MaxPool
LSTM	双向LSTM x 2层，隐藏层256
Embedding	512->256 线性层
FC	512->7644 分类层

推理参数配置

从 checkpoint 提取的关键参数:

输入通道: 1 (灰度图)
特征图尺寸: 32 x 640
CNN输出: 512通道
LSTM隐藏层: 256
字符类别数: 7644

字符映射说明

vocab.txt 包含7643个字符映射，从idx=1开始编码。CTC解码时跳过0和连续重复字符。

常见问题

Q: 识别结果为空?

A: 检查输入图片是否为单行文字图片，多行图片或非文字图片可能无法识别。

Q: 如何提高推理速度?

A: 使用NPU加速，首次推理会有编译开销。后续推理会更快。

参考链接

原始模型: https://modelscope.cn/models/damo/cv_crnn_ocr-recognition-general_damo
CRNN论文: https://arxiv.org/pdf/1507.05717.pdf
达摩院: https://damo.alibaba.com/

许可证

本项目遵循 Apache-2.0 许可证