g
gcw_C8PI9e90/siglip2-base-patch16-512-npu
模型介绍文件和版本Pull Requests讨论分析

SigLIP2 Base (patch16-512) - NPU 适配

Model Info

  • Original Model: google/siglip2-base-patch16-512
  • Architecture: SiglipForImageClassification (SigLIP2 Vision Transformer)
  • Parameters: 93,521,666
  • Image Size: 512×512
  • Patch Size: 16×16

Environment

ComponentVersion
PyTorch2.9.0
torch_npu2.9.0.post1
Ascend CANN8.5.1
NPUAscend 910B4 (29.5GB)
Transformers4.57.6

Files

FileDescription
inference.pySingle-image inference + benchmark mode
accuracy_run.pyCPU vs NPU accuracy validation
accuracy_run_perf.pyNPU performance benchmark
result.jsonSample inference result
accuracy_report.jsonAccuracy validation report
perf_report.jsonPerformance benchmark report

Usage

Single Image Inference

python3 inference.py \
  --model_path /path/to/siglip2-base-patch16-512 \
  --image /path/to/image.jpg \
  --output result.json

Performance Benchmark

python3 inference.py \
  --model_path /path/to/siglip2-base-patch16-512 \
  --benchmark \
  --warmup 3 \
  --iterations 10

Or use the dedicated benchmark script:

python3 accuracy_run_perf.py /path/to/siglip2-base-patch16-512 10 perf_report.json

Accuracy Validation

python3 accuracy_run.py \
  --model_path /path/to/siglip2-base-patch16-512 \
  --output accuracy_report.json

Performance Results (Ascend 910B4)

MetricValue
Avg Latency14.59 ms
Median Latency14.28 ms
P90 Latency14.38 ms
P99 Latency21.96 ms
Min Latency14.18 ms
Max Latency29.07 ms
Throughput68.55 img/s

Accuracy Results

MetricValue
CPU vs NPU Prediction Match✅ 5/5
Max Relative Error< 0.1%
Cosine Similarity1.0
Status✅ PASS

Note: This is the base model without fine-tuning. The classifier head (LABEL_0, LABEL_1) is randomly initialized. Fine-tuning is required for downstream tasks.

精度结论

基于现有评测数据,CPU 与 NPU 的 余弦相似度 精度误差为 0.0%,小于 1% 的精度要求。

推理成功证据

本仓库提供完整的推理脚本,支持 CPU 和 NPU 双平台推理:

# NPU 推理
python3 inference.py --device npu

# CPU 推理
python3 inference.py --device cpu

推理完成后会输出推理结果和耗时,表明模型在 NPU 上推理成功。

下载使用量0