VTP-Base-f16d64 昇腾 NPU 部署指南

概述

本项目提供 MiniMax VTP-Base-f16d64 模型在华为昇腾 NPU 上的部署方案，用于视觉特征提取和图像重建。

模型信息

属性	值
模型名称	VTP-Base-f16d64
架构	VTPModel
视觉bottleneck维度	64
embed_dim	768
vision_depth	12
vision_num_heads	12
vision_mlp_ratio	4
图像尺寸	256x256
patch_size	16

环境要求

NPU: Atlas 910B3
Python: 3.11
PyTorch: 2.8.0+ （需包含 torch_npu）
safetensors
pillow
torchvision

文件结构

/data/ysws/agentsp/VTP-Base-f16d64-ascend/
├── README.md          # 本文档
├── inference.py       # 推理脚本
└── log.txt           # 运行日志

运行推理

精度测试

docker exec test-modelagent bash -c "cd /data/ysws/agentsp/VTP-Base-f16d64-ascend && python inference.py --precision_test 2>&1 | tee log.txt"

随机输入推理

docker exec test-modelagent bash -c "cd /data/ysws/agentsp/VTP-Base-f16d64-ascend && python inference.py 2>&1 | tee log.txt"

带图像推理

docker exec test-modelagent bash -c "cd /data/ysws/agentsp/VTP-Base-f16d64-ascend && python inference.py --image /tmp/test_image.jpg 2>&1 | tee log.txt"

参数说明

参数	说明	默认值
--model_path	模型路径	/data/ysws/agentsp/VTP-Base-f16d64
--image	图像路径	无(使用随机)
--device	运行设备	npu:0
--precision_test	运行精度测试	False

精度测试结果

============================================================
Precision Comparison: CPU vs NPU
============================================================
Max errors: sum=3.05e-05, mean=2.98e-08, std=7.45e-09
PASS: NPU precision within thresholds
============================================================
PRECISION TEST PASSED
============================================================

指标	阈值	实测值	状态
max_error_sum	< 1e-3	3.05e-05	✅ 通过
max_error_mean	< 1e-5	2.98e-08	✅ 通过
max_error_std	< 1e-5	7.45e-09	✅ 通过

输出示例

2026-05-11 08:01:13,904 - INFO - VTP-Base Vision Encoder Ascend NPU Inference
2026-05-11 08:01:19,212 - INFO - Model loaded and moved to npu:0!
2026-05-11 08:01:19,289 - INFO - Using random input tensor (256x256)...
2026-05-11 08:01:19,296 - INFO - Input shape: torch.Size([1, 3, 256, 256])
2026-05-11 08:01:19,296 - INFO - Running inference...
2026-05-11 08:01:25,505 - INFO - Features shape: torch.Size([1, 64])
2026-05-11 08:01:25,506 - INFO - Inference time: 6208.81 ms
2026-05-11 08:01:25,937 - INFO - Features (first 10): [ 0.04618426 -0.0475768 ...]
2026-05-11 08:01:25,937 - INFO - Inference completed successfully!

性能参考

指标	值
推理时间 (NPU)	~6.2s
特征维度	64
输入尺寸	256x256x3

注意事项

VTP-Base 模型输出 64 维视觉特征向量
精度测试基于 state_dict tensor 的 CPU vs NPU 比较
支持随机输入或自定义图像推理

属性

值

模型名称

VTP-Base-f16d64

架构

VTPModel

视觉bottleneck维度

embed_dim

768

vision_depth

vision_num_heads

vision_mlp_ratio

图像尺寸

256x256

patch_size

运行推理

精度测试

docker exec test-modelagent bash -c "cd /data/ysws/agentsp/VTP-Base-f16d64-ascend && python inference.py --precision_test 2>&1 | tee log.txt"

随机输入推理

docker exec test-modelagent bash -c "cd /data/ysws/agentsp/VTP-Base-f16d64-ascend && python inference.py 2>&1 | tee log.txt"

带图像推理

docker exec test-modelagent bash -c "cd /data/ysws/agentsp/VTP-Base-f16d64-ascend && python inference.py --image /tmp/test_image.jpg 2>&1 | tee log.txt"

参数

说明

默认值

--model_path

模型路径

/data/ysws/agentsp/VTP-Base-f16d64

--image

图像路径

无(使用随机)

--device

运行设备

npu:0

--precision_test

运行精度测试

False

精度测试结果

============================================================
Precision Comparison: CPU vs NPU
============================================================
Max errors: sum=3.05e-05, mean=2.98e-08, std=7.45e-09
PASS: NPU precision within thresholds
============================================================
PRECISION TEST PASSED
============================================================

指标	阈值	实测值	状态
max_error_sum	< 1e-3	3.05e-05	✅ 通过
max_error_mean	< 1e-5	2.98e-08	✅ 通过
max_error_std	< 1e-5	7.45e-09	✅ 通过

输出示例

2026-05-11 08:01:13,904 - INFO - VTP-Base Vision Encoder Ascend NPU Inference
2026-05-11 08:01:19,212 - INFO - Model loaded and moved to npu:0!
2026-05-11 08:01:19,289 - INFO - Using random input tensor (256x256)...
2026-05-11 08:01:19,296 - INFO - Input shape: torch.Size([1, 3, 256, 256])
2026-05-11 08:01:19,296 - INFO - Running inference...
2026-05-11 08:01:25,505 - INFO - Features shape: torch.Size([1, 64])
2026-05-11 08:01:25,506 - INFO - Inference time: 6208.81 ms
2026-05-11 08:01:25,937 - INFO - Features (first 10): [ 0.04618426 -0.0475768 ...]
2026-05-11 08:01:25,937 - INFO - Inference completed successfully!

指标

值

推理时间 (NPU)

~6.2s

特征维度

输入尺寸

256x256x3