PP-Chart2Table 是由 PaddlePaddle 团队开发的一款 SOTA 多模态模型,专门用于中英文图表解析。其卓越性能源于创新的“Shuffled Chart Data Retrieval”训练任务,该任务结合优化的 token 掩码策略,显著提升了图表到数据表的转换效率。模型通过先进的数据合成流水线得到进一步强化,该流水线利用高质量种子数据、RAG 以及 LLMs 角色设计,构建了更丰富、更多样化的训练集。为应对大规模无标签分布外(OOD)数据的挑战,团队实施了两阶段蒸馏过程,确保模型在实际数据上具备强大的适应性和泛化能力。内部基准测试表明,PP-Chart2Table 不仅优于同等规模的模型,在关键应用场景下,其性能还可与 70 亿参数的视觉语言模型(VLM)相媲美。
import requests
from PIL import Image
from transformers import AutoModelForImageTextToText, AutoProcessor
model_path = "PaddlePaddle/PP-Chart2Table_safetensors"
model = AutoModelForImageTextToText.from_pretrained(
model_path,
device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_path)
# PPChart2TableProcessor uses hardcoded "Chart to table" instruction internally via chat template
conversation = [
{
"role": "user",
"content": [
{
"type": "image",
"url": "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/chart_parsing_02.png",
},
],
},
]
inputs = processor.apply_chat_template(
conversation,
tokenize=True,
add_generation_prompt=True,
truncation=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
generated_ids = model.generate(**inputs, do_sample=False, max_new_tokens=256)
generated_ids_trimmed = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
result = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(result)