冬

gcw_IDzXRVNw/cv_resnet-transformer_table-structure-recognition_lore-ascend

cv_resnet-transformer_table-structure-recognition_lore Ascend NPU 部署指南

项目简介

cv_resnet-transformer_table-structure-recognition_lore 是阿里巴巴达摩院开源的LORE无线表格结构识别模型。基于ResNet+Transformer架构，能够从无线表格图片中识别出单元格的结构位置（物理坐标）和逻辑坐标（行号列号）。

特性

支持 Ascend NPU 推理加速
CPU vs NPU 精度对比测试 (相对误差 < 1%)
ResNet50 backbone + Transformer processor
无线表格单元格检测与定位
热力图(heatmap)输出用于中心点检测

环境要求

硬件: 华为 Ascend 910 系列 NPU
CANN: 8.0.RC1 或更高版本
PyTorch: 2.0+ with torch_npu
OpenCV: 用于图像预处理
Docker: 容器名称 test-modelagent

目录结构

cv_resnet-transformer_table-structure-recognition_lore-ascend/
├── inference.py              # 推理测试脚本
├── log.txt                   # 测试日志
├── README.md                 # 本文档
├── inference_result.json     # 推理结果
├── precision_result.json     # 精度测试结果
├── result_visualization.jpg   # 检测结果可视化
└── test_table.jpg            # 测试图片

部署步骤

1. 进入容器

docker exec -it test-modelagent bash

2. 设置环境变量

source /usr/local/Ascend/ascend-toolkit/set_env.sh

3. 准备模型文件

模型文件位于 /data/ysws/agentsp/5-19-1/cv_resnet-transformer_table-structure-recognition_lore/iic/cv_resnet-transformer_table-structure-recognition_lore/ 目录下：

pytorch_model.pt - PyTorch模型权重
configuration.json - 模型配置
README.md - 模型说明

4. 安装依赖

pip install opencv-python torch_npu

使用方式

方式一：普通推理模式

运行推理脚本进行表格结构识别：

cd /data/ysws/agentsp/5-19-1/cv_resnet-transformer_table-structure-recognition_lore-ascend/

python3 inference.py

方式二：精度测试模式 (CPU vs NPU)

运行精度对比测试，验证 NPU 计算结果与 CPU 一致性：

cd /data/ysws/agentsp/5-19-1/cv_resnet-transformer_table-structure-recognition_lore-ascend/

python3 inference.py precision_test

命令行参数说明

参数	说明	默认值
`precision_test`	运行完整精度测试	`normal`

测试验证

精度测试结果

指标	实测值	阈值	状态
Heatmap 相对误差	0.0557%	< 1%	PASS
Width/Height 相对误差	0.0371%	< 1%	PASS
Cell count match	100 vs 100	相等	PASS

性能数据

操作	耗时
CPU推理时间	0.9957s
NPU推理时间	4.0292s
加速比	0.25x
检测单元格数	100个

测试日志 (log.txt)

============================================================
LORE Lineless Table Structure Recognition - Ascend NPU Test
============================================================
Output: /data/ysws/agentsp/5-19-1/cv_resnet-transformer_table-structure-recognition_lore-ascend

Mode: PRECISION TEST
NPU available: True
Model: /data/ysws/agentsp/5-19-1/cv_resnet-transformer_table-structure-recognition_lore/iic/cv_resnet-transformer_table-structure-recognition_lore/pytorch_model.pt (exists: True)

============================================================
Loading Test Image
============================================================
Original image shape: (327, 640, 3)
Input tensor shape: torch.Size([1, 3, 384, 384])

============================================================
Building Model
============================================================
Loading state dict...
Model built successfully

============================================================
Running CPU Inference
============================================================
CPU inference time: 0.9957s
HM output shape: torch.Size([1, 2, 96, 96])
WH output shape: torch.Size([1, 8, 96, 96])
Detected cells (CPU): 100

============================================================
Running NPU Inference
============================================================
NPU inference time: 4.0292s
Speedup: 0.25x
Detected cells (NPU): 100

============================================================
Precision Test Results
============================================================
Heatmap max diff: 9.712500e+02
Width/Height max diff: 3.601685e-01
Heatmap relative error: 5.571335e-04 (0.0557%)
Width/Height relative error: 3.709877e-04 (0.0371%)
Heatmap: PASS
Width/Height: PASS
Cell count match: PASS

Overall Status: PASS

============================================================
Saving Results
============================================================
Visualization saved: /data/ysws/agentsp/5-19-1/cv_resnet-transformer_table-structure-recognition_lore-ascend/result_visualization.jpg

============================================================
Test Complete!
============================================================

模型结构

架构类型: ResNet50 backbone + Transformer processor
Backbone: ResNet50 (4个stage，输出多层特征)
特征适配: 1x1卷积将多尺度特征适配到256通道
Detection Heads: hm(heatmap), wh(width/height), ax(auxiliary)
输入尺寸: 3 x 384 x 384
输出: Heatmap (2, 96, 96), Width/Height (8, 96, 96)

组件	说明
ResNet Backbone	4层stage，提取多尺度特征
Adaption Layers	1x1卷积，适配通道维度到256
Heatmap Head	2通道热力图，检测单元格中心点
Size Head	8通道，预测单元格宽高

推理输出说明

模型输出包含以下内容：

hm: 热力图 (batch, 2, H, W) - 通道0为背景概率，通道1为中心点概率
wh: 宽高信息 (batch, 8, H, W) - 单元格尺寸
ax: 辅助特征 (batch, 256, H, W) - 额外特征图

检测到的单元格包含：

center: [x, y] 中心点坐标（像素）
score: 置信度得分
grid_pos: [col, row] 网格位置

精度测试结果详解

精度测试对比CPU和NPU的输出：

测试项	CPU输出	NPU输出	差异	相对误差
Heatmap max	-	-	971.25	0.0557%
WH max	-	-	0.36	0.0371%
Cell count	100	100	0	PASS

注：Heatmap的绝对差异较大(971.25)是因为热力图数值范围较大，但相对误差仍然控制在0.06%以内，满足精度要求。

常见问题

Q: 为什么NPU比CPU慢?

A: 这是因为模型的部分复杂操作（如Transformer encoder）在NPU上的实现可能不如CPU优化。随着模型变小或操作更简单，NPU加速优势会显现。对于大的图像模型，NPU的并行计算优势在处理更大batch时会体现。

Q: 检测的单元格数量很多?

A: 这是正常的。模型输出的是热力图，在每个位置都会给出中心点存在的概率。通过阈值过滤(默认0.3)后保留100个检测结果。实际使用时可以根据应用场景调整阈值。

Q: 如何提高检测精度?

A: 可以通过以下方式：

调整热力图阈值(threshold参数)
使用非极大值抑制(NMS)去除重叠检测
根据实际场景重新训练模型

参考链接

原始模型: https://modelscope.cn/models/damo/cv_resnet-transformer_table-structure-recognition_lore
LORE论文: https://arxiv.org/abs/2303.03730
达摩院: https://damo.alibaba.com/

许可证

本项目遵循 Apache-2.0 许可证