cv_resnet-transformer_table-structure-recognition_lore 是阿里巴巴达摩院开源的LORE无线表格结构识别模型。基于ResNet+Transformer架构,能够从无线表格图片中识别出单元格的结构位置(物理坐标)和逻辑坐标(行号列号)。
cv_resnet-transformer_table-structure-recognition_lore-ascend/
├── inference.py # 推理测试脚本
├── log.txt # 测试日志
├── README.md # 本文档
├── inference_result.json # 推理结果
├── precision_result.json # 精度测试结果
├── result_visualization.jpg # 检测结果可视化
└── test_table.jpg # 测试图片docker exec -it test-modelagent bashsource /usr/local/Ascend/ascend-toolkit/set_env.sh模型文件位于 /data/ysws/agentsp/5-19-1/cv_resnet-transformer_table-structure-recognition_lore/iic/cv_resnet-transformer_table-structure-recognition_lore/ 目录下:
pip install opencv-python torch_npu运行推理脚本进行表格结构识别:
cd /data/ysws/agentsp/5-19-1/cv_resnet-transformer_table-structure-recognition_lore-ascend/
python3 inference.py运行精度对比测试,验证 NPU 计算结果与 CPU 一致性:
cd /data/ysws/agentsp/5-19-1/cv_resnet-transformer_table-structure-recognition_lore-ascend/
python3 inference.py precision_test| 参数 | 说明 | 默认值 |
|---|---|---|
precision_test | 运行完整精度测试 | normal |
| 指标 | 实测值 | 阈值 | 状态 |
|---|---|---|---|
| Heatmap 相对误差 | 0.0557% | < 1% | PASS |
| Width/Height 相对误差 | 0.0371% | < 1% | PASS |
| Cell count match | 100 vs 100 | 相等 | PASS |
| 操作 | 耗时 |
|---|---|
| CPU推理时间 | 0.9957s |
| NPU推理时间 | 4.0292s |
| 加速比 | 0.25x |
| 检测单元格数 | 100个 |
============================================================
LORE Lineless Table Structure Recognition - Ascend NPU Test
============================================================
Output: /data/ysws/agentsp/5-19-1/cv_resnet-transformer_table-structure-recognition_lore-ascend
Mode: PRECISION TEST
NPU available: True
Model: /data/ysws/agentsp/5-19-1/cv_resnet-transformer_table-structure-recognition_lore/iic/cv_resnet-transformer_table-structure-recognition_lore/pytorch_model.pt (exists: True)
============================================================
Loading Test Image
============================================================
Original image shape: (327, 640, 3)
Input tensor shape: torch.Size([1, 3, 384, 384])
============================================================
Building Model
============================================================
Loading state dict...
Model built successfully
============================================================
Running CPU Inference
============================================================
CPU inference time: 0.9957s
HM output shape: torch.Size([1, 2, 96, 96])
WH output shape: torch.Size([1, 8, 96, 96])
Detected cells (CPU): 100
============================================================
Running NPU Inference
============================================================
NPU inference time: 4.0292s
Speedup: 0.25x
Detected cells (NPU): 100
============================================================
Precision Test Results
============================================================
Heatmap max diff: 9.712500e+02
Width/Height max diff: 3.601685e-01
Heatmap relative error: 5.571335e-04 (0.0557%)
Width/Height relative error: 3.709877e-04 (0.0371%)
Heatmap: PASS
Width/Height: PASS
Cell count match: PASS
Overall Status: PASS
============================================================
Saving Results
============================================================
Visualization saved: /data/ysws/agentsp/5-19-1/cv_resnet-transformer_table-structure-recognition_lore-ascend/result_visualization.jpg
============================================================
Test Complete!
============================================================| 组件 | 说明 |
|---|---|
| ResNet Backbone | 4层stage,提取多尺度特征 |
| Adaption Layers | 1x1卷积,适配通道维度到256 |
| Heatmap Head | 2通道热力图,检测单元格中心点 |
| Size Head | 8通道,预测单元格宽高 |
模型输出包含以下内容:
hm: 热力图 (batch, 2, H, W) - 通道0为背景概率,通道1为中心点概率wh: 宽高信息 (batch, 8, H, W) - 单元格尺寸ax: 辅助特征 (batch, 256, H, W) - 额外特征图检测到的单元格包含:
center: [x, y] 中心点坐标(像素)score: 置信度得分grid_pos: [col, row] 网格位置精度测试对比CPU和NPU的输出:
| 测试项 | CPU输出 | NPU输出 | 差异 | 相对误差 |
|---|---|---|---|---|
| Heatmap max | - | - | 971.25 | 0.0557% |
| WH max | - | - | 0.36 | 0.0371% |
| Cell count | 100 | 100 | 0 | PASS |
注:Heatmap的绝对差异较大(971.25)是因为热力图数值范围较大,但相对误差仍然控制在0.06%以内,满足精度要求。
A: 这是因为模型的部分复杂操作(如Transformer encoder)在NPU上的实现可能不如CPU优化。随着模型变小或操作更简单,NPU加速优势会显现。对于大的图像模型,NPU的并行计算优势在处理更大batch时会体现。
A: 这是正常的。模型输出的是热力图,在每个位置都会给出中心点存在的概率。通过阈值过滤(默认0.3)后保留100个检测结果。实际使用时可以根据应用场景调整阈值。
A: 可以通过以下方式:
本项目遵循 Apache-2.0 许可证