DINOv2 ViT-small Patch14 Reg4 LVD142M Ascend 部署指南

概述

本项目提供 DINOv2 ViT-small (lvd142m) 模型在华为昇腾 NPU 上的部署方案，基于 PyTorch + torch_npu 实现高性能图像特征提取推理。

模型信息

属性	值
模型名称	vit_small_patch14_reg4_dinov2.lvd142m
架构	Vision Transformer (ViT)
参数量	22.1M
GMACS	29.6
图像尺寸	518 x 518
输出特征维度	384
预训练方法	DINOv2
预训练数据	LVD-142M

环境要求

NPU: Atlas 910B3
CANN: 8.5.1+
Python: 3.11
PyTorch: 2.9.0+ with torch_npu
safetensors: 0.7.0+
torchvision: 0.24.0+

快速部署

1. 创建容器

docker run -itd \
  --name=test-vit_small \
  --privileged \
  --ipc=host \
  --net=host \
  --device=/dev/davinci_manager \
  --device=/dev/devmm_svm \
  --device=/dev/hisi_hdc \
  --device=/dev/davinci0 \
  --device=/dev/davinci1 \
  --device=/dev/davinci2 \
  --device=/dev/davinci3 \
  --device=/dev/davinci4 \
  --device=/dev/davinci5 \
  --device=/dev/davinci6 \
  --device=/dev/davinci7 \
  -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \
  -v /usr/local/sbin:/usr/local/sbin:ro \
  -v /data/ysws/agentsp/vit_small_patch14_reg4_dinov2.lvd142m:/data/vit_model \
  -v /home:/home \
  -w /data/vit_model \
  quay.io/ascend/vllm-ascend:v0.18.0rc1 \
  /bin/bash

2. 运行推理

docker exec test-modelagent bash -c "source /usr/local/Ascend/ascend-toolkit/set_env.sh && \
cd /data/ysws/agentsp/vit_small_patch14_reg4_dinov2.lvd142m-ascend && \
python3 inference.py --model_path /data/ysws/agentsp/vit_small_patch14_reg4_dinov2.lvd142m --weight_file model.safetensors"

docker exec test-modelagent bash -c "source /usr/local/Ascend/ascend-toolkit/set_env.sh && \
cd /data/ysws/agentsp/vit_small_patch14_reg4_dinov2.lvd142m-ascend && \
python3 inference.py --model_path /data/ysws/agentsp/vit_small_patch14_reg4_dinov2.lvd142m --weight_file model.safetensors --image /path/to/image.jpg"

推理参数说明

参数	默认值	说明
`--model_path`	/data/ysws/agentsp/vit_small_patch14_reg4_dinov2.lvd142m	模型目录
`--image`	None	图片路径，不提供则使用随机张量
`--weight_file`	pytorch_model.bin	权重文件格式
`--precision_test`	False	运行精度测试

精度测试

docker exec test-modelagent bash -c "source /usr/local/Ascend/ascend-toolkit/set_env.sh && \
cd /data/ysws/agentsp/vit_small_patch14_reg4_dinov2.lvd142m-ascend && \
python3 inference.py --model_path /data/ysws/agentsp/vit_small_patch14_reg4_dinov2.lvd142m --weight_file model.safetensors --precision_test"

精度测试结果

指标	实测值	阈值	状态
Max Error (sum)	3.05e-05	< 1e-3	PASS
Max Error (mean)	5.96e-08	< 1e-5	PASS
Max Error (std)	2.98e-08	< 1e-5	PASS

性能数据

操作	耗时
CPU 参考计算 (20 tensors)	0.0188s
NPU 推理 (20 tensors)	0.2301s

性能指标

指标	值
单图推理时间	~5.6s (含编译)
图像尺寸	518 x 518
输出特征	384 维
NPU 利用率	依赖模型加载

关键配置

DINOv2 标准图像预处理

transforms.Compose([
    transforms.Resize((518, 518), interpolation=transforms.InterpolationMode.BICUBIC),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

文件结构

vit_small_patch14_reg4_dinov2.lvd142m-ascend/
├── README.md       # 本文档
├── inference.py    # 推理脚本
└── log.txt        # 运行日志

注意事项

模型首次推理包含编译时间，后续推理更快
图像预处理使用双线性插值，与 DINOv2 原始配置一致
精度测试: NPU 与 CPU 误差极小，最大 sum error 为 3.05e-05，远低于阈值 1e-3

参考链接

DINOv2 ViT-small Patch14 Reg4 LVD142M Ascend 部署指南

概述

本项目提供 DINOv2 ViT-small (lvd142m) 模型在华为昇腾 NPU 上的部署方案，基于 PyTorch + torch_npu 实现高性能图像特征提取推理。

模型信息

属性	值
模型名称	vit_small_patch14_reg4_dinov2.lvd142m
架构	Vision Transformer (ViT)
参数量	22.1M
GMACS	29.6
图像尺寸	518 x 518
输出特征维度	384
预训练方法	DINOv2
预训练数据	LVD-142M

环境要求

NPU: Atlas 910B3
CANN: 8.5.1+
Python: 3.11
PyTorch: 2.9.0+ with torch_npu
safetensors: 0.7.0+
torchvision: 0.24.0+

快速部署

1. 创建容器

docker run -itd \
  --name=test-vit_small \
  --privileged \
  --ipc=host \
  --net=host \
  --device=/dev/davinci_manager \
  --device=/dev/devmm_svm \
  --device=/dev/hisi_hdc \
  --device=/dev/davinci0 \
  --device=/dev/davinci1 \
  --device=/dev/davinci2 \
  --device=/dev/davinci3 \
  --device=/dev/davinci4 \
  --device=/dev/davinci5 \
  --device=/dev/davinci6 \
  --device=/dev/davinci7 \
  -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \
  -v /usr/local/sbin:/usr/local/sbin:ro \
  -v /data/ysws/agentsp/vit_small_patch14_reg4_dinov2.lvd142m:/data/vit_model \
  -v /home:/home \
  -w /data/vit_model \
  quay.io/ascend/vllm-ascend:v0.18.0rc1 \
  /bin/bash

2. 运行推理

docker exec test-modelagent bash -c "source /usr/local/Ascend/ascend-toolkit/set_env.sh && \
cd /data/ysws/agentsp/vit_small_patch14_reg4_dinov2.lvd142m-ascend && \
python3 inference.py --model_path /data/ysws/agentsp/vit_small_patch14_reg4_dinov2.lvd142m --weight_file model.safetensors"

docker exec test-modelagent bash -c "source /usr/local/Ascend/ascend-toolkit/set_env.sh && \
cd /data/ysws/agentsp/vit_small_patch14_reg4_dinov2.lvd142m-ascend && \
python3 inference.py --model_path /data/ysws/agentsp/vit_small_patch14_reg4_dinov2.lvd142m --weight_file model.safetensors --image /path/to/image.jpg"

推理参数说明

参数	默认值	说明
`--model_path`	/data/ysws/agentsp/vit_small_patch14_reg4_dinov2.lvd142m	模型目录
`--image`	None	图片路径，不提供则使用随机张量
`--weight_file`	pytorch_model.bin	权重文件格式
`--precision_test`	False	运行精度测试

精度测试

docker exec test-modelagent bash -c "source /usr/local/Ascend/ascend-toolkit/set_env.sh && \
cd /data/ysws/agentsp/vit_small_patch14_reg4_dinov2.lvd142m-ascend && \
python3 inference.py --model_path /data/ysws/agentsp/vit_small_patch14_reg4_dinov2.lvd142m --weight_file model.safetensors --precision_test"

精度测试结果

指标	实测值	阈值	状态
Max Error (sum)	3.05e-05	< 1e-3	PASS
Max Error (mean)	5.96e-08	< 1e-5	PASS
Max Error (std)	2.98e-08	< 1e-5	PASS

性能数据

操作	耗时
CPU 参考计算 (20 tensors)	0.0188s
NPU 推理 (20 tensors)	0.2301s

性能指标

指标	值
单图推理时间	~5.6s (含编译)
图像尺寸	518 x 518
输出特征	384 维
NPU 利用率	依赖模型加载

关键配置

DINOv2 标准图像预处理

transforms.Compose([
    transforms.Resize((518, 518), interpolation=transforms.InterpolationMode.BICUBIC),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

文件结构

vit_small_patch14_reg4_dinov2.lvd142m-ascend/
├── README.md       # 本文档
├── inference.py    # 推理脚本
└── log.txt        # 运行日志

注意事项

模型首次推理包含编译时间，后续推理更快
图像预处理使用双线性插值，与 DINOv2 原始配置一致
精度测试: NPU 与 CPU 误差极小，最大 sum error 为 3.05e-05，远低于阈值 1e-3