cd VTP-Small-f16d64-ascend/
python inference.py --precision_test
随机输入推理
cd VTP-Small-f16d64-ascend/
python inference.py
带图像推理
cd VTP-Small-f16d64-ascend/
python inference.py --image /path/to/image.jpg
参数说明
参数
说明
默认值
--model_path
模型路径
/opt/atomgit/mxy/VTP-Small-f16d64
--image
图像路径
无 (使用随机)
--device
运行设备
npu:0
--precision_test
运行精度测试
False
测试验证
精度测试结果
指标
实测值
阈值
状态
Max Error (sum)
6.10e-05
< 1e-3
✅ PASS
Max Error (mean)
5.96e-08
< 1e-5
✅ PASS
Max Error (std)
2.38e-07
< 1e-5
✅ PASS
性能数据
操作
耗时
CPU 参考计算 (20 tensors)
0.0190s
NPU 推理 (20 tensors)
0.2261s
测试日志
2026-05-19 08:47:41,266 - INFO - ============================================================
2026-05-19 08:47:41,267 - INFO - VTP Vision Encoder Ascend NPU Inference
2026-05-19 08:47:41,267 - INFO - ============================================================
2026-05-19 08:47:41,267 - INFO - Loading VTP model from /opt/atomgit/mxy/VTP-Small-f16d64...
2026-05-19 08:47:43,297 - INFO - Model loaded and moved to NPU!
2026-05-19 08:47:43,305 - INFO - ----------------------------------------
2026-05-19 08:47:43,305 - INFO - Starting precision test...
2026-05-19 08:47:43,306 - INFO - Running CPU reference computation...
2026-05-19 08:47:43,324 - INFO - CPU computation done in 0.0190s (tested 20 tensors)
2026-05-19 08:47:43,324 - INFO - Step 2: NPU inference
2026-05-19 08:47:43,550 - INFO - NPU inference done in 0.2261s
2026-05-19 08:47:43,551 - INFO - Step 3: Compare results
2026-05-19 08:47:43,551 - INFO - ============================================================
2026-05-19 08:47:43,551 - INFO - Precision Comparison: CPU vs NPU
2026-05-19 08:47:43,551 - INFO - Max errors: sum=6.10e-05, mean=5.96e-08, std=2.38e-07
2026-05-19 08:47:43,553 - INFO - PASS: NPU precision within 1% of CPU
2026-05-19 08:47:43,553 - INFO - PRECISION TEST PASSED
2026-05-19 08:47:43,553 - INFO - ============================================================
模型配置
属性
值
模型名称
VTP-Small-f16d64
架构
Vision Tokenizer with Projection
视觉嵌入维度
384
特征瓶颈维度
64 (f16d64)
Patch 大小
16
图像尺寸
256 x 256
视觉编码器深度
12
视觉注意力头数
6
MLP 类型
SwigLU (Gate Linear Unit)
Python API 使用示例
from PIL import Image
import torch
from torchvision import transforms
MODEL_PATH = "/opt/atomgit/mxy/VTP-Small-f16d64"
transform = transforms.Compose([
transforms.Resize((256, 256)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
img = Image.open("image.jpg").convert('RGB')
img_tensor = transform(img).unsqueeze(0).npu()
# 加载模型
model = load_vtp_model(MODEL_PATH)
model = model.to("npu:0")
model.eval()
with torch.no_grad():
features = model(img_tensor)
print(f"Feature shape: {features.shape}") # (1, 64)