sam-vit-base(Segment Anything Model - ViT Base)是 Meta AI 开发的图像分割模型,能够根据输入的提示(如点、框)生成高质量的对象掩码。该模型在 1100 万张图像和 11 亿个掩码的数据集上训练,在各种分割任务上具有强大的零样本性能。
sam-vit-base-ascend/
├── inference.py # 推理测试脚本
├── test_mask.png # 测试分割掩码输出
├── log.txt # 精度测试日志
├── log_inference.txt # 推理测试日志
├── README.md # 本文档docker exec -it test-modelagent bashsource /usr/local/Ascend/ascend-toolkit/set_env.sh模型文件位于 /data/ysws/agentsp/5-15/sam-vit-base/ 目录下:
运行推理脚本进行图像分割:
cd /data/ysws/agentsp/5-15/sam-vit-base-ascend/
python3 inference.py --mode inference --device npu:0运行精度对比测试,验证 NPU 计算结果与 CPU 一致性:
cd /data/ysws/agentsp/5-15/sam-vit-base-ascend/
python3 inference.py --mode precision_test| 参数 | 说明 | 默认值 |
|---|---|---|
--mode | 测试模式: inference 或 precision_test | inference |
--device | 运行设备 | npu:0 (自动检测) |
| 指标 | 实测值 | 阈值 | 状态 |
|---|---|---|---|
| IoU 相对误差 | 0.0152% | < 1.00% | PASS |
| IoU Cosine 相似度 | 1.000000 | > 0.99 | PASS |
| Pred Masks 相对误差 | 0.1077% | < 1.00% | PASS |
| Pred Masks Cosine 相似度 | 1.000000 | > 0.99 | PASS |
| 操作 | 耗时 |
|---|---|
| CPU 推理时间 | 58.590s |
| NPU 推理时间 | 6.154s |
| NPU 加速比 | ~9.5x |
| 输入 | 输出维度 | 推理时间 |
|---|---|---|
| 256x256 RGB图像 + 点提示 | [1, 1, 3, 256, 256] masks | 5.905s |
2026-05-15 14:49:51,781 - INFO - ============================================================
2026-05-15 14:49:51,781 - INFO - sam-vit-base NPU 推理测试
2026-05-15 14:49:51,781 - INFO - ============================================================
2026-05-15 14:49:51,781 - INFO - Model dir: /data/ysws/agentsp/5-15/sam-vit-base
2026-05-15 14:49:51,781 - INFO - Output dir: /data/ysws/agentsp/5-15/sam-vit-base-ascend
2026-05-15 14:49:51,781 - INFO - NPU available: True
2026-05-15 14:49:51,781 - INFO - NPU device count: 8
2026-05-15 14:49:53,433 - INFO - NPU 0: Ascend910B3, total_memory=61.0GB
2026-05-15 14:49:53,433 - INFO - NPU 1: Ascend910B3, total_memory=61.0GB
2026-05-15 14:49:53,433 - INFO - ============================================================
2026-05-15 14:49:53,433 - INFO - Inference Test on npu:0
2026-05-15 14:49:53,433 - INFO - ============================================================
2026-05-15 14:49:58,822 - INFO - Device: npu:0
2026-05-15 14:49:58,823 - INFO - Loading processor...
2026-05-15 14:50:00,539 - INFO - Model loaded successfully
2026-05-15 14:50:00,541 - INFO - Test image size: (256, 256)
2026-05-15 14:50:00,613 - INFO - pixel_values shape: torch.Size([1, 3, 1024, 1024])
2026-05-15 14:50:00,613 - INFO - input_points shape: torch.Size([1, 1, 1, 2])
2026-05-15 14:50:06,518 - INFO - Inference time: 5.905s
2026-05-15 14:50:06,519 - INFO - Output type: <class 'transformers.models.sam.modeling_sam.SamImageSegmentationOutput'>
2026-05-15 14:50:06,519 - INFO - pred_masks shape: torch.Size([1, 1, 3, 256, 256])
2026-05-15 14:50:06,519 - INFO - iou_scores shape: torch.Size([1, 1, 3])
2026-05-15 14:50:07,792 - INFO - Post-processed masks: 1 mask(s)
2026-05-15 14:50:07,792 - INFO - mask[0] shape: torch.Size([1, 3, 256, 256])
2026-05-15 14:50:07,805 - INFO - Saved mask to: /data/ysws/agentsp/5-15/sam-vit-base-ascend/test_mask.png
2026-05-15 14:50:07,808 - INFO - ============================================================
2026-05-15 14:50:07,808 - INFO - INFERENCE RESULT
2026-05-15 14:50:07,808 - INFO - ============================================================
2026-05-15 14:50:07,808 - INFO - Inference time: 5.905s
2026-05-15 14:50:07,808 - INFO - ============================================================
2026-05-15 14:50:07,808 - INFO - Test Complete!
2026-05-15 14:50:07,808 - INFO - ============================================================2026-05-15 14:48:00,415 - INFO - ============================================================
2026-05-15 14:48:00,415 - INFO - sam-vit-base NPU 推理测试
2026-05-15 14:48:00,415 - INFO - ============================================================
2026-05-15 14:48:00,415 - INFO - Model dir: /data/ysws/agentsp/5-15/sam-vit-base
2026-05-15 14:48:00,415 - INFO - Output dir: /data/ysws/agentsp/5-15/sam-vit-base-ascend
2026-05-15 14:48:00,415 - INFO - NPU available: True
2026-05-15 14:48:00,416 - INFO - NPU device count: 8
2026-05-15 14:48:02,078 - INFO - NPU 0: Ascend910B3, total_memory=61.0GB
2026-05-15 14:48:02,079 - INFO - NPU 1: Ascend910B3, total_memory=61.0GB
2026-05-15 14:48:02,079 - INFO - ============================================================
2026-05-15 14:48:02,079 - INFO - Precision Test: CPU vs NPU (threshold: 1.0%)
2026-05-15 14:48:02,079 - INFO - ============================================================
2026-05-15 14:48:07,540 - INFO - Loading processor...
2026-05-15 14:48:07,550 - INFO - Loading model for CPU...
2026-05-15 14:48:07,846 - INFO - Loading model for NPU...
2026-05-15 14:48:09,272 - INFO - pixel_values shape: torch.Size([1, 3, 1024, 1024])
2026-05-15 14:48:09,274 - INFO - input_points shape: torch.Size([1, 1, 1, 2])
2026-05-15 14:48:09,274 - INFO - Running inference on CPU...
2026-05-15 14:49:07,892 - INFO - Running inference on NPU...
2026-05-15 14:49:15,263 - INFO - pred_masks CPU shape: (1, 1, 3, 256, 256)
2026-05-15 14:49:15,263 - INFO - pred_masks NPU shape: (1, 1, 3, 256, 256)
2026-05-15 14:49:15,263 - INFO - CPU inference time: 58.590s
2026-05-15 14:49:15,264 - INFO - NPU inference time: 6.154s
2026-05-15 14:49:15,268 - INFO - === IoU Scores Precision ===
2026-05-15 14:49:15,268 - INFO - IoU max relative error: 1.521992e-04 (0.0152%)
2026-05-15 14:49:15,268 - INFO - IoU cosine similarity: 1.000000
2026-05-15 14:49:15,268 - INFO - === Pred Masks Precision ===
2026-05-15 14:49:15,268 - INFO - Max absolute error: 1.959991e-02
2026-05-15 14:49:15,268 - INFO - Max relative error: 1.076624e-03 (0.1077%)
2026-05-15 14:49:15,268 - INFO - Mean relative error: 2.195446e-04 (0.0220%)
2026-05-15 14:49:15,269 - INFO - Cosine similarity: 1.000000 (-0.0000% angular error)
2026-05-15 14:49:15,269 - INFO - PASS: True (threshold: 1.0%)
2026-05-15 14:49:15,302 - INFO - ============================================================
2026-05-15 14:49:15,302 - INFO - PRECISION TEST RESULT
2026-05-15 14:49:15,302 - INFO - ============================================================
2026-05-15 14:49:15,302 - INFO - Relative error: 1.076624e-03
2026-05-15 14:49:15,302 - INFO - CPU time: 58.590s
2026-05-15 14:49:15,303 - INFO - NPU time: 6.154s
2026-05-15 14:49:15,303 - INFO - PASS: True
2026-05-15 14:49:15,303 - INFO - ============================================================
2026-05-15 14:49:15,303 - INFO - Test Complete!
2026-05-15 14:49:15,303 - INFO - ============================================================import torch
from PIL import Image
import numpy as np
from transformers import SamModel, SamProcessor
MODEL_DIR = "/data/ysws/agentsp/5-15/sam-vit-base"
processor = SamProcessor.from_pretrained(MODEL_DIR)
model = SamModel.from_pretrained(MODEL_DIR)
model = model.to("npu:0")
model.eval()
raw_image = Image.fromarray(np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8))
input_points = [[[128, 128]]]
inputs = processor(raw_image, input_points=input_points, return_tensors="pt")
inputs = {k: v.to("npu:0") if isinstance(v, torch.Tensor) else v for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
masks = processor.image_processor.post_process_masks(
outputs.pred_masks.cpu(),
inputs["original_sizes"].cpu(),
inputs["reshaped_input_sizes"].cpu()
)
print(f"Masks shape: {masks[0].shape}") # [1, 3, 256, 256]input_points = [[[450, 600]]] # 2D point coordinates
inputs = processor(raw_image, input_points=input_points, return_tensors="pt")input_boxes = [[[x1, y1, x2, y2]]] # bounding box coordinates
inputs = processor(raw_image, input_boxes=input_boxes, return_tensors="pt")from PIL import Image
mask_output = masks[0][0, 0].cpu().numpy()
mask_img = Image.fromarray((mask_output * 255).astype(np.uint8))
mask_img.save("output_mask.png")SAM 模型由三个主要模块组成:
| 组件 | 说明 |
|---|---|
| vision_encoder | ViT-Base,12层,768隐藏维度 |
| prompt_encoder | 256隐藏维度,4个点嵌入维度 |
| mask_decoder | 2层transformer,输出256x256掩码 |
从 config.json 提取的关键参数:
{
"vision_config.hidden_size": 768,
"vision_config.num_hidden_layers": 12,
"vision_config.num_attention_heads": 12,
"vision_config.image_size": 1024,
"vision_config.patch_size": 16,
"prompt_encoder_config.hidden_size": 256,
"prompt_encoder_config.image_embedding_size": 64,
"mask_decoder_config.hidden_size": 256,
"mask_decoder_config.num_hidden_layers": 2
}A: NPU 推理比 CPU 快约 9.5 倍。使用批处理点提示(points_per_batch 参数)可以进一步提高吞吐量。
A: 输出掩码形状为 [batch, 1, num_masks, height, width],其中 num_masks 通常为 3(三个不同的掩码预测)。
A: SAM 预处理器会自动将图像调整为最长边 1024 像素。可以修改 preprocessor_config.json 中的 size 参数来调整。
本项目遵循 Apache-2.0 许可证