WebSSL-DINO1B-224 Ascend NPU 部署指南

项目简介

WebSSL-DINO1B-224 是一个 1B 参数的 Vision Transformer (ViT) 模型，使用 DINOv2 自监督学习方法在 20 亿张网页图片上训练。本项目提供其在华为 Ascend NPU 环境下的部署方案。

特性

支持 Ascend NPU 推理加速
CPU vs NPU 精度对比测试
图像特征提取能力
224x224 分辨率支持

环境要求

硬件: 华为 Ascend 910 系列 NPU
CANN: 8.0.RC1 或更高版本
PyTorch: 2.0+ with torch_npu
Docker: 容器名称 test-modelagent

目录结构

/data/ysws/agentsp/webssl-dino1b-full2b-224-ascend/
├── inference.py          # 精度测试脚本
├── log.txt               # 测试日志
├── README.md             # 本文档
├── test_image_0.png      # 测试图片样本
├── test_image_1.png      # 测试图片样本
└── test_image_2.png      # 测试图片样本

部署步骤

1. 进入容器

docker exec -it test-modelagent bash

2. 设置环境变量

source /usr/local/Ascend/ascend-toolkit/set_env.sh

3. 准备模型文件

模型文件应放在 /data/ysws/agentsp/webssl-dino1b-full2b-224/ 目录下：

config.json - 模型配置
preprocessor_config.json - 预处理器配置
model.safetensors - 模型权重 (约 4.5GB)

4. 执行推理+精度测试

cd /data/ysws/agentsp/webssl-dino1b-full2b-224-ascend/
python3 inference.py

测试验证

精度测试结果

指标	实测值	阈值	状态
Max Error (sum)	1.72e-01	< 2.40e-01	PASS
Max Error (mean)	1.42e-05	< 1.00e-04	PASS
Max Error (std)	1.14e-04	< 1.00e-03	PASS

性能数据

操作	耗时
模型加载	9.02s
CPU 参考计算 (20 tensors)	1.17s
NPU 推理 (20 tensors)	0.09s
图像推理 (224x224)	5.11s

测试日志

完整测试日志保存在 log.txt

使用示例

运行推理

import torch
from PIL import Image
from transformers import AutoImageProcessor, Dinov2Model

model_path = "/data/ysws/agentsp/webssl-dino1b-full2b-224"
device = torch.device("npu:0")

model = Dinov2Model.from_pretrained(
    model_path,
    dtype=torch.bfloat16,
    low_cpu_mem_usage=True
).to(device).eval()

processor = AutoImageProcessor.from_pretrained(model_path)

image = Image.open("path/to/image.jpg")
inputs = processor(images=image, return_tensors="pt")
pixel_values = inputs["pixel_values"].to(device)

with torch.no_grad():
    outputs = model(pixel_values=pixel_values)
    cls_features = outputs.last_hidden_state[:, 0]
    patch_features = outputs.last_hidden_state[:, 1:]
    print(f"CLS features shape: {cls_features.shape}")
    print(f"Patch features shape: {patch_features.shape}")

处理器调用说明

BitImageProcessor 的调用方式:

inputs = processor(
    images=image,      # PIL Image 或 numpy array
    return_tensors="pt"
)

模型结构

参数	值
参数量	1B
Hidden Size	1536
Encoder Layers	40
Attention Heads	24
Image Size	224x224
Patch Size	14x14
输出特征维度	1536

常见问题

Q: 精度测试失败?

A: 检查 NPU 驱动是否正确安装, 确保 CANN 环境变量已 source。

Q: 支持哪些图像格式?

A: 支持 PIL Image 支持的所有格式, 包括 JPEG, PNG, RGB 等。

Q: 推理时间较长?

A: DINOv2 模型较大 (1B 参数), 首次推理需要约 5 秒。后续推理会使用缓存。

许可证

本项目遵循 WebSSL-DINO1B 原始许可证 (cc-by-nc-4.0)。