trocr-small-printed 是 Microsoft 开发的 TrOCR 模型的小尺寸版本,专门在 SROIE 数据集上微调用于印刷体文本 OCR 识别。该模型由图像 Transformer 编码器和文本 Transformer 解码器组成,能够将图像中的印刷文本转换为可编辑的文本。
trocr-small-printed-ascend/
├── inference.py # 推理测试脚本
├── log.txt # 测试日志
├── README.md # 本文档
├── test_image.png # 测试图像
├── test_sample.txt # 测试样例说明
├── inference_result.json # 推理结果
└── precision_result.json # 精度测试结果docker exec -it test-modelagent bashsource /usr/local/Ascend/ascend-toolkit/set_env.sh模型文件位于 /data/ysws/agentsp/5-17/trocr-small-printed/microsoft/trocr-small-printed/ 目录下:
pip install transformers torch_npu pillow -i https://pypi.huaweicloud.com/repository/pypi/simple/Run the inference script for OCR recognition:
cd /data/ysws/agentsp/5-17/trocr-small-printed-ascend/
python3 inference.py --mode inference运行精度对比测试,验证 NPU 计算结果与 CPU 一致性:
cd /data/ysws/agentsp/5-17/trocr-small-printed-ascend/
python3 inference.py --mode precision_testcd /data/ysws/agentsp/5-17/trocr-small-printed-ascend/
python3 inference.py --mode all| 参数 | 说明 | 默认值 |
|---|---|---|
--mode | 测试模式: inference, precision_test 或 all | all |
| 指标 | 实测值 | 阈值 | 状态 |
|---|---|---|---|
| 最大相对误差 | 0.0000% | < 1.00% | PASS |
| 最大绝对误差 | 0.00e+00 | - | - |
| CPU 推理时间 | 1.784s | - | - |
| NPU 推理时间 | 0.037s | - | - |
| 加速比 | 47.81x | > 1x | PASS |
| 输出文本一致性 | 完全一致 | - | PASS |
| 操作 | 耗时 |
|---|---|
| NPU 推理时间 | 5.629s |
| 精度测试 CPU 时间 | 1.784s |
| 精度测试 NPU 时间 | 0.037s |
输入图像: 384x64 白色图像 输出文本: "1" (空白图像的识别结果)
import torch
from PIL import Image
from transformers import AutoImageProcessor, AutoTokenizer, VisionEncoderDecoderModel
MODEL_DIR = "/data/ysws/agentsp/5-17/trocr-small-printed/microsoft/trocr-small-printed"
image_processor = AutoImageProcessor.from_pretrained(MODEL_DIR)
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
model = VisionEncoderDecoderModel.from_pretrained(MODEL_DIR)
model = model.to("npu:0").eval()
image = Image.open("your_image.png").convert("RGB")
inputs = image_processor(images=image, return_tensors="pt")
pixel_values = inputs["pixel_values"].to("npu:0")
with torch.no_grad():
generated_ids = model.generate(pixel_values)
text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"Recognized text: {text}")from PIL import Image
import requests
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
inputs = image_processor(images=image, return_tensors="pt")
pixel_values = inputs["pixel_values"].to("npu:0")
with torch.no_grad():
generated_ids = model.generate(pixel_values)
text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]| 组件 | 说明 |
|---|---|
| encoder | DeiT 图像 Transformer 编码器 |
| decoder | TrOCR 文本 Transformer 解码器 |
| generate | 自回归文本生成 |
从 config.json 提取的关键参数:
{
"model_type": "vision-encoder-decoder",
"encoder": {
"model_type": "deit",
"hidden_size": 384,
"num_hidden_layers": 12,
"num_attention_heads": 6,
"image_size": 384
},
"decoder": {
"model_type": "trocr",
"d_model": 256,
"decoder_layers": 6,
"decoder_attention_heads": 8,
"vocab_size": 64044
},
"eos_token_id": 2,
"pad_token_id": 1,
"decoder_start_token_id": 2
}A: 检查 NPU 驱动是否正确安装。TrOCR 模型在 CPU 和 NPU 上的输出完全一致,误差为 0%。
A: 确保输入图像清晰、对比度高。TrOCR 在印刷体文本上表现最佳,手写体请使用其他模型。
A: 空白或低对比度图像会生成空文本。确保图像包含清晰的印刷文本。
============================================================
TrOCR NPU Test
Model: microsoft/trocr-small-printed
Output: /data/ysws/agentsp/5-17/trocr-small-printed-ascend
============================================================
============================================================
TrOCR Inference Test (NPU)
============================================================
Device: npu:0
Model: /data/ysws/agentsp/5-17/trocr-small-printed/microsoft/trocr-small-printed
Loading processor...
Loading model...
Loading weights: 100%|██████████| 360/360 [00:00<00:00, 4602.19it/s]
Model loaded successfully
Input image size: (384, 64)
Pixel values shape: torch.Size([1, 3, 384, 384])
Generated text: 1
Inference time: 5.629s
Inference result saved to /data/ysws/agentsp/5-17/trocr-small-printed-ascend/inference_result.json
============================================================
Precision Test (CPU vs NPU)
============================================================
Using device: npu:0
Loading processor...
Loading model on CPU...
Loading model on npu:0...
Running inference on CPU...
Running inference on NPU...
CPU inference time: 1.784s
NPU inference time: 0.037s
Speedup: 47.81x
CPU text: 1
NPU text: 1
Texts match: True
Max absolute error: 0.000000e+00
Max relative error: 0.0000% (threshold: 1.0%)
Status: PASS
Precision result saved to /data/ysws/agentsp/5-17/trocr-small-printed-ascend/precision_result.json
============================================================
Creating Test Sample
============================================================
Saved test image: /data/ysws/agentsp/5-17/trocr-small-printed-ascend/test_image.png
Saved test sample info: /data/ysws/agentsp/5-17/trocr-small-printed-ascend/test_sample.txt
============================================================
Test Complete!
============================================================本项目遵循 Apache-2.0 许可证