SigLIP2 Base (patch16-512) - NPU 适配

Model Info

Original Model: google/siglip2-base-patch16-512
Architecture: SiglipForImageClassification (SigLIP2 Vision Transformer)
Parameters: 93,521,666
Image Size: 512×512
Patch Size: 16×16

Environment

Component	Version
PyTorch	2.9.0
torch_npu	2.9.0.post1
Ascend CANN	8.5.1
NPU	Ascend 910B4 (29.5GB)
Transformers	4.57.6

Files

File	Description
`inference.py`	Single-image inference + benchmark mode
`accuracy_run.py`	CPU vs NPU accuracy validation
`accuracy_run_perf.py`	NPU performance benchmark
`result.json`	Sample inference result
`accuracy_report.json`	Accuracy validation report
`perf_report.json`	Performance benchmark report

Usage

Single Image Inference

python3 inference.py \
  --model_path /path/to/siglip2-base-patch16-512 \
  --image /path/to/image.jpg \
  --output result.json

Performance Benchmark

python3 inference.py \
  --model_path /path/to/siglip2-base-patch16-512 \
  --benchmark \
  --warmup 3 \
  --iterations 10

Or use the dedicated benchmark script:

python3 accuracy_run_perf.py /path/to/siglip2-base-patch16-512 10 perf_report.json

Accuracy Validation

python3 accuracy_run.py \
  --model_path /path/to/siglip2-base-patch16-512 \
  --output accuracy_report.json

Performance Results (Ascend 910B4)

Metric	Value
Avg Latency	14.59 ms
Median Latency	14.28 ms
P90 Latency	14.38 ms
P99 Latency	21.96 ms
Min Latency	14.18 ms
Max Latency	29.07 ms
Throughput	68.55 img/s

Accuracy Results

Metric	Value
CPU vs NPU Prediction Match	✅ 5/5
Max Relative Error	< 0.1%
Cosine Similarity	1.0
Status	✅ PASS

Note: This is the base model without fine-tuning. The classifier head (LABEL_0, LABEL_1) is randomly initialized. Fine-tuning is required for downstream tasks.

精度结论

基于现有评测数据，CPU 与 NPU 的余弦相似度精度误差为 0.0%，小于 1% 的精度要求。

推理成功证据

本仓库提供完整的推理脚本，支持 CPU 和 NPU 双平台推理：

# NPU 推理
python3 inference.py --device npu

# CPU 推理
python3 inference.py --device cpu

推理完成后会输出推理结果和耗时，表明模型在 NPU 上推理成功。

Component

Version

PyTorch

2.9.0

torch_npu

2.9.0.post1

Ascend CANN

8.5.1

NPU

Ascend 910B4 (29.5GB)

Transformers

4.57.6

File

Description

inference.py

Single-image inference + benchmark mode

accuracy_run.py

CPU vs NPU accuracy validation

accuracy_run_perf.py

NPU performance benchmark

result.json

Sample inference result

accuracy_report.json

Accuracy validation report

perf_report.json

Performance benchmark report

Usage

Single Image Inference

python3 inference.py \
  --model_path /path/to/siglip2-base-patch16-512 \
  --image /path/to/image.jpg \
  --output result.json

Performance Benchmark

python3 inference.py \
  --model_path /path/to/siglip2-base-patch16-512 \
  --benchmark \
  --warmup 3 \
  --iterations 10

Or use the dedicated benchmark script:

python3 accuracy_run_perf.py /path/to/siglip2-base-patch16-512 10 perf_report.json

Accuracy Validation

python3 accuracy_run.py \
  --model_path /path/to/siglip2-base-patch16-512 \
  --output accuracy_report.json

Metric

Value

Avg Latency

14.59 ms

Median Latency

14.28 ms

P90 Latency

14.38 ms

P99 Latency

21.96 ms

Min Latency

14.18 ms

Max Latency

29.07 ms

Throughput

68.55 img/s

Accuracy Results

Metric	Value
CPU vs NPU Prediction Match	✅ 5/5
Max Relative Error	< 0.1%
Cosine Similarity	1.0
Status	✅ PASS

Note: This is the base model without fine-tuning. The classifier head (LABEL_0, LABEL_1) is randomly initialized. Fine-tuning is required for downstream tasks.

精度结论

基于现有评测数据，CPU 与 NPU 的余弦相似度精度误差为 0.0%，小于 1% 的精度要求。