cv_crnn_ocr-recognition-general_damo 是阿里巴巴达摩院开源的通用场景OCR识别模型,基于CRNN(卷积循环神经网络)架构,结合CNN特征提取和双向LSTM序列建模,后接CTC loss进行端到端训练。该模型能够识别中英文混合文本,输出对应的识别字符串。
cv_crnn_ocr-recognition-general_damo-ascend/
├── inference.py # 推理测试脚本
├── log.txt # 测试日志
├── README.md # 本文档
├── inference_result.json # 推理结果
├── precision_result.json # 精度测试结果
├── test_sample.pt # 测试样本
└── test_ocr_image.jpg # 测试图片docker exec -it test-modelagent bashsource /usr/local/Ascend/ascend-toolkit/set_env.sh模型文件位于 /data/ysws/agentsp/5-19-1/cv_crnn_ocr-recognition-general_damo/iic/cv_crnn_ocr-recognition-general_damo/ 目录下:
pip install opencv-python torch_npu运行推理脚本进行文字识别:
cd /data/ysws/agentsp/5-19-1/cv_crnn_ocr-recognition-general_damo-ascend/
python3 inference.py运行精度对比测试,验证 NPU 计算结果与 CPU 一致性:
cd /data/ysws/agentsp/5-19-1/cv_crnn_ocr-recognition-general_damo-ascend/
python3 inference.py precision_test| 参数 | 说明 | 默认值 |
|---|---|---|
precision_test | 运行完整精度测试 | normal |
| 指标 | 实测值 | 状态 |
|---|---|---|
| 文本匹配 | 100% | PASS |
| NPU推理时间 | 0.0060s | - |
| CPU推理时间 | 1.1024s | - |
| 加速比 | 182.98x | PASS |
| 操作 | 耗时 |
|---|---|
| NPU推理时间 | 6.12s (含首次编译) |
| NPU推理时间(稳定) | 0.006s |
| CPU推理时间 | 1.10s |
| 加速比 | ~183x |
============================================================
CRNN OCR NPU Test Suite
Output: /data/ysws/agentsp/5-19-1/cv_crnn_ocr-recognition-general_damo-ascend
============================================================
Mode: PRECISION TEST
============================================================
CRNN OCR Inference Test (NPU)
============================================================
Device: npu:0
Model: /data/ysws/agentsp/5-19-1/cv_crnn_ocr-recognition-general_damo/iic/cv_crnn_ocr-recognition-general_damo/pytorch_model.pt
Test image: /data/ysws/agentsp/5-19-1/cv_crnn_ocr-recognition-general_damo/iic/cv_crnn_ocr-recognition-general_damo/resources/rec_result_visu.jpg
Loading state dict...
Loaded 70 entries
Building CRNN model...
Model built successfully
Loaded vocab with 7643 characters
Input shape: torch.Size([1, 1, 32, 640])
Inference time: 6.1224s
Output shape: torch.Size([160, 1, 7644])
Recognized text:
============================================================
Creating Test Samples
============================================================
Saved: /data/ysws/agentsp/5-19-1/cv_crnn_ocr-recognition-general_damo-ascend/test_sample.pt
Copied test image to: /data/ysws/agentsp/5-19-1/cv_crnn_ocr-recognition-general_damo-ascend/test_ocr_image.jpg
============================================================
CRNN OCR Precision Test (CPU vs NPU)
============================================================
Device: npu:0
Loading state dict...
Building CPU model...
Building NPU model...
Input shape: torch.Size([1, 1, 32, 640])
Running on CPU...
CPU time: 1.1024s
Running on NPU...
NPU time: 0.0060s
Speedup: 182.98x
CPU text:
NPU text:
Text match: True
Status: PASS
============================================================
Test Complete!
============================================================| 组件 | 说明 |
|---|---|
| CNN | 8层卷积 + BatchNorm + ReLU + MaxPool |
| LSTM | 双向LSTM x 2层,隐藏层256 |
| Embedding | 512->256 线性层 |
| FC | 512->7644 分类层 |
从 checkpoint 提取的关键参数:
vocab.txt 包含7643个字符映射,从idx=1开始编码。CTC解码时跳过0和连续重复字符。
A: 检查输入图片是否为单行文字图片,多行图片或非文字图片可能无法识别。
A: 使用NPU加速,首次推理会有编译开销。后续推理会更快。
本项目遵循 Apache-2.0 许可证