cv_F3Net_product-segmentation 是基于 F3Net (Fusion, Feedback and Focus) 架构的商品分割模型,用于显著性目标检测和商品图像分割。模型输入商品宣传图,输出分割 mask,适用于电商领域的商品展示图分割任务。
cv_F3Net_product-segmentation-ascend/
├── inference.py # 推理测试脚本
├── log.txt # 测试日志 (完整)
├── README.md # 本文档
├── test_sample.pt # 测试样本
├── test_segmentation.jpg # 测试图像
├── test_sample_info.json # 测试样本信息
├── inference_result.json # 推理结果
└── precision_result.json # 精度测试结果docker exec -it test-modelagent bashsource /usr/local/Ascend/ascend-toolkit/set_env.sh模型文件位于 /data/ysws/agentsp/5-19-1/cv_F3Net_product-segmentation/iic/cv_F3Net_product-segmentation/ 目录下:
pip install torch torch_npu pillow -i https://pypi.huaweicloud.com/repository/pypi/simple/Run the inference script for NPU inference testing:
cd /data/ysws/agentsp/5-19-1/cv_F3Net_product-segmentation-ascend/
# 普通推理模式
python3 inference.py
# 首次运行会有模型加载和编译开销Run the accuracy comparison test to verify the consistency between NPU calculation results and CPU.
cd /data/ysws/agentsp/5-19-1/cv_F3Net_product-segmentation-ascend/
# 精度测试模式
python3 inference.py precision_test| 指标 | 实测值 | 阈值 | 状态 |
|---|---|---|---|
| 相对误差 (p1) | 7.57e-06 | < 1% | PASS |
| 相对误差 (p2) | 6.48e-06 | < 1% | PASS |
| 相对误差 (r2) | 1.68e-04 | < 1% | PASS |
| 相对误差 (r3) | 1.36e-04 | < 1% | PASS |
| 相对误差 (r4) | 1.55e-04 | < 1% | PASS |
| 相对误差 (r5) | 2.15e-04 | < 1% | PASS |
| 最大相对误差 | 0.0215% | < 1% | PASS |
| 操作 | 耗时 |
|---|---|
| CPU 推理时间 (单次) | 1.2613s |
| NPU 推理时间 (单次) | 0.0242s |
| 首次 NPU 推理 (含编译) | 4.1355s |
| 加速比 | 52.17x |
| 输出层 | 输出尺寸 | 说明 |
|---|---|---|
| p1 | [1, 1, 24, 24] | 最终预测 (上采样) |
| p2 | [1, 1, 24, 24] | 最终预测 (上采样) |
| r2 | [1, 1, 12, 12] | 侧边输出 Level 2 |
| r3 | [1, 1, 12, 12] | 侧边输出 Level 3 |
| r4 | [1, 1, 12, 12] | 侧边输出 Level 4 |
| r5 | [1, 1, 12, 12] | 侧边输出 Level 5 |
结果: CPU 和 NPU 输出的最大相对误差仅为 0.0215%,远低于 1% 阈值,完全通过精度验证。
============================================================
F3Net NPU Test Suite
Output: /data/ysws/agentsp/5-19-1/cv_F3Net_product-segmentation-ascend
============================================================
Mode: PRECISION TEST
============================================================
F3Net NPU Inference Test
============================================================
Device: npu:0
Model: /data/ysws/agentsp/5-19-1/cv_F3Net_product-segmentation/iic/cv_F3Net_product-segmentation/pytorch_model.bin
Test image: /data/ysws/agentsp/5-19-1/cv_F3Net_product-segmentation/iic/cv_F3Net_product-segmentation/test_segmentation.jpg
Loading state dict...
Loaded 694 entries
Building F3Net model...
Model built successfully
Input shape: torch.Size([1, 3, 384, 384]), original size: (800, 800)
Inference time: 4.1355s
Output p1: torch.Size([1, 1, 24, 24])
Output p2: torch.Size([1, 1, 24, 24])
Output r2: torch.Size([1, 1, 12, 12])
Output r3: torch.Size([1, 1, 12, 12])
Output r4: torch.Size([1, 1, 12, 12])
Output r5: torch.Size([1, 1, 12, 12])
============================================================
Creating Test Samples
============================================================
Saved: /data/ysws/agentsp/5-19-1/cv_F3Net_product-segmentation-ascend/test_sample.pt
Copied test image to: /data/ysws/agentsp/5-19-1/cv_F3Net_product-segmentation-ascend/test_segmentation.jpg
============================================================
F3Net Precision Test (CPU vs NPU)
============================================================
Device: npu:0
Loading state dict...
Building CPU model...
Building NPU model...
Input shape: torch.Size([1, 3, 384, 384])
Running on CPU...
CPU time: 1.2613s
CPU output p1: torch.Size([1, 1, 24, 24])
CPU output p2: torch.Size([1, 1, 24, 24])
CPU output r2: torch.Size([1, 1, 12, 12])
CPU output r3: torch.Size([1, 1, 12, 12])
CPU output r4: torch.Size([1, 1, 12, 12])
CPU output r5: torch.Size([1, 1, 12, 12])
Running on NPU...
NPU time: 0.0242s
NPU output p1: torch.Size([1, 1, 24, 24])
NPU output p2: torch.Size([1, 1, 24, 24])
NPU output r2: torch.Size([1, 1, 12, 12])
NPU output r3: torch.Size([1, 1, 12, 12])
NPU output r4: torch.Size([1, 1, 12, 12])
NPU output r5: torch.Size([1, 1, 12, 12])
p1 max rel err: 7.568317e-06
p2 max rel err: 6.478573e-06
r2 max rel err: 1.682996e-04
r3 max rel err: 1.358213e-04
r4 max rel err: 1.552435e-04
r5 max rel err: 2.147764e-04
Speedup: 52.17x
Max relative error: 2.147764e-04 (0.0215%)
Threshold: 1.0%
Status: PASS
============================================================
Test Complete!
============================================================F3Net 模型架构:
| 层 | 输出通道 | 输出尺寸(输入 384x384) |
|---|---|---|
| conv1 | 64 | 192x192 |
| layer1(3 个模块) | 256 | 96x96 |
| layer2(4 个模块) | 512 | 48x48 |
| layer3(6 个模块) | 1024 | 24x24 |
| layer4(3 个模块) | 2048 | 12x12 |
| 层 | 输入通道 | 输出通道 |
|---|---|---|
| squeeze2 | 256 | 64 |
| squeeze3 | 512 | 64 |
| squeeze4 | 1024 | 64 |
| squeeze5 | 2048 | 64 |
import torch
from PIL import Image
import numpy as np
MODEL_PATH = "/data/ysws/agentsp/5-19-1/cv_F3Net_product-segmentation/iic/cv_F3Net_product-segmentation/pytorch_model.bin"
IMAGE_PATH = "path/to/image.jpg"
model = F3Net(torch.load(MARAM_PATH, map_location="cpu"))
model = model.to("npu:0")
model.eval()
img = Image.open(IMAGE_PATH).convert('RGB')
img_tensor = torch.from_numpy(np.array(img)).permute(2, 0, 1).float() / 255.0
img_tensor = (img_tensor - torch.tensor([0.485, 0.456, 0.406]).view(3, 1, 1)) / torch.tensor([0.229, 0.224, 0.225]).view(3, 1, 1)
img_tensor = img_tensor.unsqueeze(0).to("npu:0")
with torch.no_grad():
output = model(img_tensor)
mask = output['p1'] # or any of the output keys
print(f"Mask shape: {mask.shape}") # torch.Size([1, 1, H, W])import torch.nn.functional as F
output = model(img_tensor)
p1 = output['p1']
p2 = output['p2']
# 上采样到原始图像尺寸
mask = (p1 + p2) / 2
mask = F.interpolate(mask, size=(800, 800), mode='bilinear', align_corners=False)
mask = (mask.squeeze() > 0.5).cpu().numpy()
# mask 是二值分割图
print(f"Mask shape: {mask.shape}") # (800, 800)A: 检查 NPU 驱动是否正确安装,确保 CANN 环境变量已 source。0.01-0.1% 的数值误差是正常的,因为 NPU 和 CPU 使用不同的计算精度。
A: 首次推理需要 JIT 编译和算子加载,约 4 秒。后续推理会显著加速 (0.024 秒)。
A: p1 和 p2 是最终预测结果,r2-r5 是侧边输出。一般使用 p1 或 p2 作为最终分割结果。
A: 将图像 resize 到 384x384 (保持宽高比),然后将 mask resize 回原始尺寸。
本项目遵循 Apache-2.0 许可证