google/vit-large-patch32-384 on Ascend NPU

1. 简介

google/vit-large-patch32-384 是 Google 发布的 ViT-Large 图片分类模型，基于 Vision Transformer 架构。输入分辨率 384x384，patch size 32，参数量 306.6M，输出 ImageNet-1K 1000 类。

本工程将其适配为单卡昇腾 NPU（Ascend910B4）可运行的提交工程，使用官方 AutoImageProcessor 做预处理。

2. 验证环境

项目	版本/信息
NPU 型号	Ascend910B4 (29.5 GB HBM)
CANN	8.5.1
torch	2.9.0+cpu
torch_npu	2.9.0.post1+gitee7ba04
transformers	4.57.6
Python	3.11.14
npu-smi	25.5.1

3. 推理运行

依赖安装

pip install -r requirements.txt
# torch_npu 通常由昇腾容器提供，无需单独安装

运行命令

# 设置 HuggingFace 镜像
export HF_ENDPOINT=https://hf-mirror.com

# 推理
python inference.py

# 精度验证（CPU vs NPU）
python eval_accuracy.py

# 性能基准测试
python benchmark.py

日志路径

推理日志：logs/inference.log
预测结果：logs/prediction.txt
精度日志：logs/accuracy.log
性能日志：logs/benchmark.log
环境检查：logs/env_check.log

4. Smoke 验证

torch.npu.is_available(): True
torch.npu.device_count(): 1
torch.npu.get_device_name(0): Ascend910B4

Top-1: sandbar, sand bar (27.13%)
Top-2: seashore, coast, seacoast, sea-coast (26.09%)
Top-3: sea lion (14.94%)
Top-4: lakeside, lakeshore (8.62%)
Top-5: promontory, headland, head, foreland (6.03%)

5. 性能参考

指标	值
平均延迟	23.7ms
最小延迟	23.5ms
最大延迟	24.1ms
P50	23.7ms
P90	24.0ms
P95	24.0ms
吞吐量	42.14 images/sec

预热：2 次
正式测试：5 次
输入：[1, 3, 384, 384] (384x384 分辨率)

6. 精度评测（CPU vs NPU 对比）

指标	值
max_abs_diff (logits)	0.026264
mean_abs_diff (logits)	0.005554
prob_max_diff	0.004743
Top-1 match	True
Top-5 match	True

预处理器：ViTImageProcessor（官方 AutoImageProcessor）
结论：PASS — CPU 与 NPU 推理结果一致，仅有微小浮点精度差异

7. 自验证截图

见 screenshots/self_verification.png

8. 日志文件

logs/env_check.log — 环境检查详情
logs/inference.log — 推理结果
logs/prediction.txt — Top-5 预测
logs/accuracy.log — 精度对比
logs/benchmark.log — 性能基准

9. 注意事项

预处理器使用官方 AutoImageProcessor（ViTImageProcessor），非手写伪预处理
首次推理耗时较长（约 20.6 秒），后续推理稳定在 ~23.7ms
模型权重不包含在本工程中，需从 HuggingFace 下载
使用 HF_ENDPOINT=https://hf-mirror.com 镜像加速模型下载
本验证为 smoke accuracy（同图 CPU vs NPU 一致性），非 ImageNet 官方精度

10. 标签

#NPU

1. 简介

本工程将其适配为单卡昇腾 NPU（Ascend910B4）可运行的提交工程，使用官方 AutoImageProcessor 做预处理。

项目

版本/信息

NPU 型号

Ascend910B4 (29.5 GB HBM)

CANN

8.5.1

torch

2.9.0+cpu

torch_npu

2.9.0.post1+gitee7ba04

transformers

4.57.6

Python

3.11.14

npu-smi

25.5.1

3. 推理运行

依赖安装

pip install -r requirements.txt
# torch_npu 通常由昇腾容器提供，无需单独安装

运行命令

# 设置 HuggingFace 镜像
export HF_ENDPOINT=https://hf-mirror.com

# 推理
python inference.py

# 精度验证（CPU vs NPU）
python eval_accuracy.py

# 性能基准测试
python benchmark.py

日志路径

推理日志：logs/inference.log

预测结果：logs/prediction.txt

精度日志：logs/accuracy.log

性能日志：logs/benchmark.log

环境检查：logs/env_check.log

4. Smoke 验证

torch.npu.is_available(): True
torch.npu.device_count(): 1
torch.npu.get_device_name(0): Ascend910B4

Top-1: sandbar, sand bar (27.13%)
Top-2: seashore, coast, seacoast, sea-coast (26.09%)
Top-3: sea lion (14.94%)
Top-4: lakeside, lakeshore (8.62%)
Top-5: promontory, headland, head, foreland (6.03%)

指标

值

平均延迟

23.7ms

最小延迟

23.5ms

最大延迟

24.1ms

P50

23.7ms

P90

24.0ms

P95

24.0ms

吞吐量

42.14 images/sec

指标

值

max_abs_diff (logits)

0.026264

mean_abs_diff (logits)

0.005554

prob_max_diff

0.004743

Top-1 match

True

Top-5 match

True

9. 注意事项

预处理器使用官方 AutoImageProcessor（ViTImageProcessor），非手写伪预处理

首次推理耗时较长（约 20.6 秒），后续推理稳定在 ~23.7ms

模型权重不包含在本工程中，需从 HuggingFace 下载

使用 HF_ENDPOINT=https://hf-mirror.com 镜像加速模型下载

本验证为 smoke accuracy（同图 CPU vs NPU 一致性），非 ImageNet 官方精度