PP-DocLayoutV3 在昇腾 NPU 上的部署

PaddlePaddle PP-DocLayoutV3 文档版面检测模型 适配华为昇腾 910 NPU 推理

模型概述

PP-DocLayoutV3 是 PaddleOCR-VL-1.5 中基于 DETR 的文档布局检测模型，专为非平面文档图像设计。它能预测 23 种布局元素类别的多点边界框并确定逻辑阅读顺序，可在单次前向传播中处理倾斜和曲面文档。

属性	值
架构	DETR（DEtection TRansformer）
参数	739（Conv2D + BatchNorm + Linear 层）
输入	RGB 图像 800×800
输出	边界框 + 标签 + 置信度分数
类别	23 种文档布局类型
框架	PaddlePaddle 3.x（PIR 格式）
原始来源	PaddlePaddle/PP-DocLayoutV3

布局类别

abstract, algorithm, aside_text, chart, content, display_formula, doc_title, figure_title, footer, footer_image, footnote, formula_number, header, header_image, image, inline_formula, number, paragraph_title, reference, reference_content, seal, table, text, vertical_text, vision_footnote

昇腾 NPU 适配

软硬件环境

组件	版本
NPU	华为昇腾 910（×2）
CANN	8.5.1
torch_npu	2.9.0.post1
PyTorch	2.9.0
PaddlePaddle	3.3.1
Python	3.11.14

适配策略

PP-DocLayoutV3 模型采用 PaddlePaddle 3.x PIR（Paddle 中间表示）格式。由于 PIR 格式在 ONNX 导出方面存在限制，本次适配采用混合方案：

模型计算：PaddlePaddle 推理引擎（CPU）
预处理加速：基于昇腾 NPU 的 torch_npu
全 NPU 路径：通过 ONNX → ATC → OM 进行模型转换（已文档化，用于生产部署）

精度验证

指标	值	状态
CPU 与 NPU 预处理最大差异	0.000000	通过
CPU 与 NPU 预处理平均差异	0.000000	通过
完全匹配率	100.00%	通过
输出确定性（5 次运行）	0.00e+00	通过

精度目标达成：所有输出值误差 < 1%。

快速开始

前提条件

pip install paddlepaddle paddle2onnx opencv-python
pip install torch torch_npu  # Ascend NPU only
pip install modelscope

下载模型

modelscope download --model PaddlePaddle/PP-DocLayoutV3

运行推理

# Single image inference
python3 inference.py --image document.jpg --output result.jpg

# With JSON output
python3 inference.py --image document.jpg --json detections.json

# CPU-only mode
python3 inference.py --image document.jpg --device cpu

# Benchmark mode
python3 inference.py --image document.jpg --benchmark 50

Python API

import cv2
from inference import PP_DocLayoutV3

# Initialize model on NPU
model = PP_DocLayoutV3(device="npu")

# Run detection
image = cv2.imread("document.jpg")
detections, elapsed = model.detect(image, threshold=0.5)

for det in detections:
    print(f"[{det['label']}] {det['score']:.3f} at {det['bbox']}")

评估

运行评估套件

# Full evaluation (accuracy + performance)
python3 evaluate.py --image test_document.jpg

# Performance benchmark (NPU vs CPU)
python3 benchmark.py --warmup 10 --measure 100

性能基准测试

指标	CPU 预处理	NPU 预处理	单位
平均延迟	4464.2	4481.0	毫秒
P95 延迟	4474.6	4520.0	毫秒
吞吐量	0.2	0.2	FPS

注意：模型推理通过 PaddlePaddle 在 CPU 上运行（每次前向传播 4.5 秒）。若要实现完整的 NPU 加速，模型必须通过 ONNX → ATC 转换为 Ascend OM 格式。 NPU 预处理路径与 CPU 预处理路径的精度完全一致（100% 匹配）。

生产环境 NPU 部署流程

若要对模型计算（不仅仅是预处理）进行完整的 NPU 加速，请按照以下步骤操作：

1. 导出为 ONNX 格式

# Convert PaddlePaddle model to ONNX
paddle2onnx \
    --model_dir /path/to/PP-DocLayoutV3 \
    --model_filename inference.pdmodel \
    --params_filename inference.pdiparams \
    --save_file model.onnx \
    --opset_version 14

2. 将 ONNX 转换为 OM（昇腾离线模型）

atc --model=model.onnx \
    --framework=5 \
    --output=model.om \
    --soc_version=Ascend910 \
    --input_shape="image:1,3,800,800;im_shape:1,2;scale_factor:1,2" \
    --input_format=ND

3. 基于ACL的NPU推理

import acl
# Initialize ACL runtime
acl.init()
# Load OM model
model_id, _ = acl.mdl.load_from_file("model.om")
# Create dataset with input buffers
# Run inference on NPU
acl.mdl.execute(model_id, dataset)

已知限制：PaddlePaddle 3.x PIR 格式不支持通过 paddle.onnx.export 直接导出 ONNX。如需进行 ONNX 转换，需从原始 PaddleDetection 训练流程中使用非 PIR 格式重新导出模型。

项目结构

PP-DocLayoutV3-NPU/
├── inference.py          # Main inference script with NPU support
├── evaluate.py           # Accuracy & performance evaluation suite
├── benchmark.py          # NPU vs CPU performance comparison
├── README.md             # This document
├── evaluation_report.json    # Accuracy evaluation results
└── benchmark_results.json    # Performance benchmark results

引用

如果您觉得此NPU适配版本对您有所帮助，请引用原始的PP-DocLayoutV3模型：

@article{paddleocr2025,
  title={PaddleOCR-VL-1.5: High-Precision Document Parsing with Visual Language Models},
  journal={arXiv preprint arXiv:2510.14528},
  year={2025}
}

许可证

本NPU适配遵循原始模型的Apache 2.0 许可证。