文本图像校正的主要目的是对图像进行几何变换,以修正图像中的文档扭曲、倾斜、透视变形等问题,从而使后续的文本识别更加准确。
| 模型 | CER |
|---|---|
| UVDoc | 0.179 |
注:测试数据集:docunet 基准数据集。
import requests
from PIL import Image
from transformers import AutoImageProcessor, AutoModel
model_path = "PaddlePaddle/UVDoc_safetensors"
model = AutoModel.from_pretrained(model_path, device_map="auto")
image_processor = AutoImageProcessor.from_pretrained(model_path)
image = Image.open(requests.get("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/doc_test.jpg", stream=True).raw)
inputs = image_processor(images=image, return_tensors="pt").to(model.device)
outputs = model(**inputs)
result = image_processor.post_process_document_rectification(outputs.last_hidden_state, inputs["original_images"])
print(result)