DINOv3 ViT-small Patch16 LVD1689M Ascend 部署指南

概述

本项目提供 DINOv3 ViT-small (lvd1689m) 模型在华为昇腾 NPU 上的部署方案，基于 PyTorch + torch_npu 实现高性能图像特征提取推理。

模型信息

属性	值
模型名称	vit_small_patch16_dinov3.lvd1689m
架构	Vision Transformer (ViT)
参数量	22M
图像尺寸	256 x 256
输出特征维度	384
池化方式	Average (avg)
预训练方法	DINOv3
预训练数据	LVD-1689M

环境信息

项目	版本/内容
设备	Ascend 910B

文件结构

vit_small_patch16_dinov3.lvd1689m-ascend/
├── inference.py                   # 推理脚本
└── README.md                       # 本文档

运行推理

精度测试

cd vit_small_patch16_dinov3.lvd1689m-ascend/
python inference.py --precision_test

推理测试

cd vit_small_patch16_dinov3.lvd1689m-ascend/
python inference.py --model_path vit_small_patch16_dinov3.lvd1689m --weight_file model.safetensors

推理参数说明

参数	默认值	说明
`--model_path`	vit_small_patch16_dinov3.lvd1689m	模型目录
`--image`	None	图片路径，不提供则使用随机张量
`--weight_file`	pytorch_model.bin	权重文件格式
`--precision_test`	False	运行精度测试

精度测试结果

指标	实测值	阈值	状态
Max Error (sum)	6.10e-05	< 1e-3	PASS
Max Error (mean)	2.38e-07	< 1e-5	PASS
Max Error (std)	1.86e-09	< 1e-5	PASS

性能数据

操作	耗时
CPU 参考计算 (20 tensors)	0.0356s
NPU 推理 (20 tensors)	0.2478s

性能指标

指标	值
单图推理时间	~5.5s (含编译)
图像尺寸	256 x 256
输出特征	384 维
吞吐量	0.18 images/sec

文件结构

vit_small_patch16_dinov3.lvd1689m-ascend/
├── README.md       # 本文档
├── inference.py    # 推理脚本
└── test.log        # 运行日志

注意事项

模型首次推理包含编译时间，后续推理更快
图像预处理使用双线性插值
精度测试: NPU 与 CPU 误差极小，最大 sum error 为 6.10e-05，远低于阈值 1e-3

参考链接

属性

值

模型名称

vit_small_patch16_dinov3.lvd1689m

架构

Vision Transformer (ViT)

参数量

22M

图像尺寸

256 x 256

输出特征维度

384

池化方式

Average (avg)

预训练方法

DINOv3

预训练数据

LVD-1689M

项目

版本/内容

设备

Ascend 910B

推理参数说明

参数	默认值	说明
`--model_path`	vit_small_patch16_dinov3.lvd1689m	模型目录
`--image`	None	图片路径，不提供则使用随机张量
`--weight_file`	pytorch_model.bin	权重文件格式
`--precision_test`	False	运行精度测试

精度测试结果

指标	实测值	阈值	状态
Max Error (sum)	6.10e-05	< 1e-3	PASS
Max Error (mean)	2.38e-07	< 1e-5	PASS
Max Error (std)	1.86e-09	< 1e-5	PASS

性能数据

操作	耗时
CPU 参考计算 (20 tensors)	0.0356s
NPU 推理 (20 tensors)	0.2478s

指标

值

单图推理时间

~5.5s (含编译)

图像尺寸

256 x 256

输出特征

384 维

吞吐量

0.18 images/sec