PP-DocBee2-3B

简介

PaddleOCR团队研发了PP-DocBee2-3B，这是一款显著增强中文文档理解能力的多模态大模型。该新版本在原PP-DocBee的基础上，引入了改进的数据优化方案，提升了数据质量。PP-DocBee2通过专有数据合成策略生成的47万条合成数据构成的相对较小数据集，在中文文档理解任务上实现了卓越性能。在内部测试中，PP-DocBee2在中文业务场景指标上较其前身PP-DocBee有11.4%的显著提升。此外，在关键准确度指标上，它优于其他同等规模的主流开源和闭源模型。关键准确度指标如下：

模型	模型存储大小(GB)	总分
PP-DocBee-2B	4.2	765
PP-DocBee-7B	15.8	-
PP-DocBee2-3B	7.6	852

注：上述模型的总分是在内部评估集上的测试结果，所有图像分辨率（高度、宽度）为(1680, 1204)，共1196条数据，涵盖财务报告、法律法规、科技论文、手册、人文论文、合同、研究报告等场景。目前暂无公开计划。

快速开始

安装

PaddlePaddle

请参考以下命令，使用pip安装PaddlePaddle：

# for CUDA11.8
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

# for CUDA12.6
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

# for CPU
python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

关于 PaddlePaddle 的安装详情，请参考 PaddlePaddle 官方网站。

PaddleOCR

从 PyPI 安装最新版本的 PaddleOCR 推理包：

python -m pip install paddleocr

模型使用方法

您可以通过以下单条命令快速体验功能：

paddleocr doc_vlm \
    --model_name PP-DocBee2-3B \
    -i "{'image': 'https://cdn-uploads.huggingface.co/production/uploads/684acf07de103b2d44c85531/l5xpHbfLn75dKInhQZ84I.png', 'query': 'Recognize the content of this table and output it in markdown format.'}"

您还可以将文档视觉语言模块的模型推理功能集成到您的项目中。在运行以下代码之前，请将示例图片下载到本地机器。

from paddleocr import DocVLM
model = DocVLM(model_name="PP-DocBee2-3B")
results = model.predict(
    input={
        "image": "https://cdn-uploads.huggingface.co/production/uploads/684acf07de103b2d44c85531/l5xpHbfLn75dKInhQZ84I.png", 
        "query": "Recognize the content of this table and output it in markdown format."
    },
    batch_size=1
)
for res in results:
    res.print()
    res.save_to_json(f"./output/res.json")

运行后，得到的结果如下：

{'res': {'image': 'medal_table_en.png', 'query': 'Recognize the content of this table and output it in markdown format', 'result': '| Rank | Country/Region | Gold | Silver | Bronze | Total Medals |\n|---|---|---|---|---|---|\n| 1 | China (CHN) | 48 | 22 | 30 | 100 |\n| 2 | United States (USA) | 36 | 39 | 37 | 112 |\n| 3 | Russia (RUS) | 24 | 13 | 23 | 60 |\n| 4 | Great Britain (GBR) | 19 | 13 | 19 | 51 |\n| 5 | Germany (GER) | 16 | 11 | 14 | 41 |\n| 6 | Australia (AUS) | 14 | 15 | 17 | 46 |\n| 7 | South Korea (KOR) | 13 | 11 | 8 | 32 |\n| 8 | Japan (JPN) | 9 | 8 | 8 | 25 |\n| 9 | Italy (ITA) | 8 | 9 | 10 | 27 |\n| 10 | France (FRA) | 7 | 16 | 20 | 43 |\n| 11 | Netherlands (NED) | 7 | 5 | 4 | 16 |\n| 12 | Ukraine (UKR) | 7 | 4 | 11 | 22 |\n| 13 | Kenya (KEN) | 6 | 4 | 6 | 16 |\n| 14 | Spain (ESP) | 5 | 11 | 3 | 19 |\n| 15 | Jamaica (JAM) | 5 | 4 | 2 | 11 |\n'}}

可视化结果如下：

| Rank | Country/Region | Gold | Silver | Bronze | Total Medals |
|---|---|---|---|---|---|
| 1 | China (CHN) | 48 | 22 | 30 | 100 |
| 2 | United States (USA) | 36 | 39 | 37 | 112 |
| 3 | Russia (RUS) | 24 | 13 | 23 | 60 |
| 4 | Great Britain (GBR) | 19 | 13 | 19 | 51 |
| 5 | Germany (GER) | 16 | 11 | 14 | 41 |
| 6 | Australia (AUS) | 14 | 15 | 17 | 46 |
| 7 | South Korea (KOR) | 13 | 11 | 8 | 32 |
| 8 | Japan (JPN) | 9 | 8 | 8 | 25 |
| 9 | Italy (ITA) | 8 | 9 | 10 | 27 |
| 10 | France (FRA) | 7 | 16 | 20 | 43 |
| 11 | Netherlands (NED) | 7 | 5 | 4 | 16 |
| 12 | Ukraine (UKR) | 7 | 4 | 11 | 22 |
| 13 | Kenya (KEN) | 6 | 4 | 6 | 16 |
| 14 | Spain (ESP) | 5 | 11 | 3 | 19 |
| 15 | Jamaica (JAM) | 5 | 4 | 2 | 11 |

有关使用命令和参数说明的详细信息，请参考文档。

流水线使用

单一模型的能力是有限的。而由多个模型组成的流水线能够提供更强的能力，以解决现实场景中的复杂问题。

doc_understanding

文档理解流水线是一种基于视觉语言模型（VLM）的高级文档处理技术，旨在克服传统文档处理的局限性。该流水线仅包含1个模块：

文档视觉语言模块

运行以下单个命令即可快速体验OCR流水线：

paddleocr doc_understanding -i "{'image': 'https://cdn-uploads.huggingface.co/production/uploads/684acf07de103b2d44c85531/l5xpHbfLn75dKInhQZ84I.png', 'query': 'Recognize the content of this table and output it in markdown format.'}"

结果将打印到终端：

{'res': {'image': 'medal_table_en.png', 'query': 'Recognize the content of this table and output it in markdown format', 'result': '| Rank | Country/Region | Gold | Silver | Bronze | Total Medals |\n|---|---|---|---|---|---|\n| 1 | China (CHN) | 48 | 22 | 30 | 100 |\n| 2 | United States (USA) | 36 | 39 | 37 | 112 |\n| 3 | Russia (RUS) | 24 | 13 | 23 | 60 |\n| 4 | Great Britain (GBR) | 19 | 13 | 19 | 51 |\n| 5 | Germany (GER) | 16 | 11 | 14 | 41 |\n| 6 | Australia (AUS) | 14 | 15 | 17 | 46 |\n| 7 | South Korea (KOR) | 13 | 11 | 8 | 32 |\n| 8 | Japan (JPN) | 9 | 8 | 8 | 25 |\n| 9 | Italy (ITA) | 8 | 9 | 10 | 27 |\n| 10 | France (FRA) | 7 | 16 | 20 | 43 |\n| 11 | Netherlands (NED) | 7 | 5 | 4 | 16 |\n| 12 | Ukraine (UKR) | 7 | 4 | 11 | 22 |\n| 13 | Kenya (KEN) | 6 | 4 | 6 | 16 |\n| 14 | Spain (ESP) | 5 | 11 | 3 | 19 |\n| 15 | Jamaica (JAM) | 5 | 4 | 2 | 11 |\n'}}

如果指定了 save_path，可视化结果将保存到 save_path 目录下。可视化输出如下所示：

image/png

命令行方式适用于快速体验。若要进行项目集成，同样只需少量代码即可：

from paddleocr import DocUnderstanding

pipeline = DocUnderstanding(
    doc_understanding_model_name="PP-DocBee2-3B"
)
output = pipeline.predict(
    {
        "image": "https://cdn-uploads.huggingface.co/production/uploads/684acf07de103b2d44c85531/l5xpHbfLn75dKInhQZ84I.png",
        "query": "Recognize the content of this table and output it in markdown format."
    }
)
for res in output:
    res.print() ## Print the structured output of the prediction
    res.save_to_json("./output/")

管道中默认使用的模型为PP-DocBee2-3B，因此无需为doc_understanding_model_name参数指定PP-DocBee2-3B，但您可以通过doc_understanding_model_dir参数使用本地模型文件。有关使用命令及参数说明的详情，请参考文档。

链接

PaddleOCR 代码库

PaddleOCR 文档