vitmatte-small-composition-1k 是基于视觉Transformer(ViT)的图像抠图模型,专门在Composition-1k数据集上训练。该模型由Yao等人在论文ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers中提出,能够准确估计图像中的前景物体,实现高精度抠图。
vitmatte-small-composition-1k-ascend/
├── inference.py # 推理测试脚本
├── log.txt # 测试日志
├── README.md # 本文档
├── test_image.npy # 测试图像
├── test_trimap.npy # 测试trimap
├── test_sample.txt # 测试样本说明
├── inference_result.json # 推理结果
└── precision_result.json # 精度测试结果docker exec -it test-modelagent bashsource /usr/local/Ascend/ascend-toolkit/set_env.sh模型文件位于 /data/ysws/agentsp/5-16/vitmatte-small-composition-1k/hustvl/vitmatte-small-composition-1k/ 目录下:
pip install transformers torch_npu -i https://pypi.huaweicloud.com/repository/pypi/simple/Run the inference script for image matting:
cd /data/ysws/agentsp/5-16/vitmatte-small-composition-1k-ascend/
python3 inference.py --mode inference运行精度对比测试,验证 NPU 计算结果与 CPU 一致性:
cd /data/ysws/agentsp/5-16/vitmatte-small-composition-1k-ascend/
python3 inference.py --mode precision_testcd /data/ysws/agentsp/5-16/vitmatte-small-composition-1k-ascend/
python3 inference.py --mode all| 参数 | 说明 | 默认值 |
|---|---|---|
--mode | 测试模式: inference, precision_test 或 all | all |
| 指标 | 实测值 | 阈值 | 状态 |
|---|---|---|---|
| 最大相对误差 | 0.8556% | < 1.00% | PASS |
| 最大绝对误差 | 2.67e-05 | - | - |
| CPU 推理时间 | 6.952s | - | - |
| NPU 推理时间 | 0.044s | - | - |
| 加速比 | 158.55x | > 1x | PASS |
| 操作 | 耗时 |
|---|---|
| NPU 推理时间 (512x512) | 6.427s |
| 精度测试 CPU 时间 | 6.952s |
| 精度测试 NPU 时间 | 0.044s |
输入图像: 512x512 RGB 输入 Trimap: 512x512 掩码 (0=背景, 1=未知, 2=前景) 输出 Alpha: 512x512 透明度通道
import torch
import numpy as np
from transformers import VitMatteForImageMatting, VitMatteImageProcessor
MODEL_DIR = "/data/ysws/agentsp/5-16/vitmatte-small-composition-1k/hustvl/vitmatte-small-composition-1k"
model = VitMatteForImageMatting.from_pretrained(MODEL_DIR)
processor = VitMatteImageProcessor.from_pretrained(MODEL_DIR)
model = model.to("npu:0").eval()
image = np.random.rand(512, 512, 3)
trimap = np.random.randint(0, 2, (512, 512)).astype(np.float32)
inputs = processor(images=image, trimaps=trimap, return_tensors="pt")
pixel_values = inputs["pixel_values"].to("npu:0")
with torch.no_grad():
outputs = model(pixel_values)
alphas = outputs.alphasfrom PIL import Image
img = Image.open("your_image.jpg")
trimap_img = Image.open("your_trimap.png")
image = np.array(img).astype(np.float32) / 255.0
trimap = np.array(trimap_img).astype(np.float32) / 255.0
inputs = processor(images=image, trimaps=trimap, return_tensors="pt")
outputs = model(inputs["pixel_values"].to("npu:0"))
alpha = outputs.alphas.cpu().numpy()[0, 0]
result_img = Image.fromarray((alpha * 255).astype(np.uint8))
result_img.save("result.png")| 组件 | 说明 |
|---|---|
| backbone | ViT-Det 特征提取器 |
| convstream | 卷积流特征融合 |
| fusion | 融合模块输出最终 alpha |
从 config.json 提取的关键参数:
{
"hidden_size": 384,
"image_size": 512,
"model_type": "vitmatte",
"backbone_config": {
"hidden_size": 384,
"num_attention_heads": 6,
"num_channels": 4,
"window_size": 14
},
"convstream_hidden_sizes": [48, 96, 192],
"fusion_hidden_sizes": [256, 128, 64, 32]
}A: 检查 NPU 驱动是否正确安装。0.85% 的数值误差是正常的,因为 NPU 和 CPU 使用不同的计算精度和算子实现。
A: 首次推理需要编译算子。ViTMatte 模型较大,NPU 推理非常快 (0.044s vs CPU 6.95s)。
A: Trimap 是一种三值掩码,表示图像中每个像素的状态:0=确定背景,1=未知区域,2=确定前景。模型根据 Trimap 估计未知区域的 alpha 值。
============================================================
ViTMatte NPU Test
Model: hustvl/vitmatte-small-composition-1k
Output: /data/ysws/agentsp/5-16/vitmatte-small-composition-1k-ascend
============================================================
============================================================
ViTMatte Inference Test (NPU)
============================================================
Device: npu:0
Model: /data/ysws/agentsp/5-16/vitmatte-small-composition-1k/hustvl/vitmatte-small-composition-1k
Loading model...
Loading weights: 100%|██████████| 258/258 [00:00<00:00, 4963.51it/s]
Model loaded successfully
Image shape: torch.Size([1, 3, 512, 512])
Trimap shape: torch.Size([1, 1, 512, 512])
Pixel values shape: torch.Size([1, 4, 512, 512])
Output alpha shape: torch.Size([1, 1, 512, 512])
Inference time: 6.427s
Inference result saved to /data/ysws/agentsp/5-16/vitmatte-small-composition-1k-ascend/inference_result.json
============================================================
Precision Test (CPU vs NPU)
============================================================
Using device: npu:0
Loading model on CPU...
Loading weights: 100%|██████████| 258/258 [00:00<00:00, 4597.71it/s]
Loading model on npu:0...
Loading weights: 100%|██████████| 258/258 [00:00<00:00, 4690.33it/s]
Loading processor...
Running inference on CPU...
Running inference on NPU...
CPU inference time: 6.952s
NPU inference time: 0.044s
Speedup: 158.55x
Max absolute error: 2.669613e-05
Max relative error: 0.8556% (threshold: 1.0%)
Status: PASS
Precision result saved to /data/ysws/agentsp/5-16/vitmatte-small-composition-1k-ascend/precision_result.json
============================================================
Creating Test Sample
============================================================
Saved test image: /data/ysws/agentsp/5-16/vitmatte-small-composition-1k-ascend/test_image.npy (shape: torch.Size([3, 512, 512]))
Saved test trimap: /data/ysws/agentsp/5-16/vitmatte-small-composition-1k-ascend/test_trimap.npy (shape: torch.Size([1, 512, 512]))
Saved test sample info: /data/ysws/agentsp/5-16/vitmatte-small-composition-1k-ascend/test_sample.txt
============================================================
Test Complete!
============================================================本项目遵循 Apache-2.0 许可证