本文档记录 nvidia/RADIO-L 在 昇腾 NPU 上的快速适配与验证结果。RADIO-L 是 NVIDIA Research 提出的视觉基础模型(Vision Foundation Model),为 RADIO 系列中的 Large 版本,用于生成图像的全局摘要特征(summary)和空间特征(features)。
模型返回一个元组:
summary: 全局图像表征,shape 为 (B, C)features: 空间特征,shape 为 (B, T, D),可用于密集预测任务或接入 LLM相关获取地址:
| 组件 | 版本 |
|---|---|
torch | 2.9.0+cpu |
torch-npu | 2.9.0.post1+gitee7ba04 |
transformers | 4.40.1 |
timm | 1.0.27 |
Pillow | latest |
einops | latest |
2 逻辑卡(Ascend 910)/opt/atomgit/weight/RADIO-L8.5.1python3 -m atomgit download hf_mirrors/nvidia/RADIO-L -d /opt/atomgit/weight/RADIO-Lmodelscope download --model nvidia/RADIO-L --local_dir /opt/atomgit/weight/RADIO-Lpip install torch transformers timm einops Pillow注:本环境已预装
torch-npu,无需额外安装。
运行推理脚本:
python3 inference.py assets/demo.png============================================================
RADIO-L NPU Inference
============================================================
[1/4] Loading image processor...
[2/4] Loading model from local weights...
[3/4] Moving model to NPU...
[4/4] Loading and preprocessing image...
Input image size: (1182, 718)
Pixel values shape: torch.Size([1, 3, 768, 1264])
Pixel values dtype: torch.float32
Pixel values device: npu:0
Running warmup...
Running inference...
============================================================
Results
============================================================
Summary shape: torch.Size([1, 3072])
Summary dtype: torch.float32
Summary device: npu:0
Summary mean: -0.004366
Summary std: 0.292162
Summary min: -2.135583
Summary max: 1.106211
Features shape: torch.Size([1, 3792, 1024])
Features dtype: torch.float32
Features device: npu:0
Features mean: 0.003050
Features std: 0.503550
Inference time: 95.02 ms推理产物已保存至 output/summary.pt 和 output/features.pt。
运行性能评测脚本:
python3 benchmark.py assets/demo.png============================================================
RADIO-L NPU Benchmark
============================================================
Batch size: 1
Input shape: torch.Size([1, 3, 768, 1264])
Device: npu:0
Warmup 10 iterations...
Benchmarking 50 iterations...
============================================================
Benchmark Results
============================================================
Mean latency: 92.96 ms
Median latency: 92.83 ms
Min latency: 91.98 ms
Max latency: 95.24 ms
P99 latency: 94.92 ms
Std dev: 0.62 ms
Throughput: 10.76 samples/sec评测结果已保存至 output/benchmark.txt。
运行精度验证脚本:
python3 accuracy.py assets/demo.png============================================================
RADIO-L NPU Accuracy Validation
============================================================
[1/3] Running CPU inference (reference)...
CPU summary shape: torch.Size([1, 3072])
CPU features shape: torch.Size([1, 3792, 1024])
[2/3] Running NPU inference...
NPU summary shape: torch.Size([1, 3072])
NPU features shape: torch.Size([1, 3792, 1024])
[3/3] Computing accuracy metrics...
============================================================
Accuracy Results
============================================================
Summary (global representation):
relative_error : 0.000715
absolute_error : 0.001531
max_diff : 0.010228
cosine_similarity : 0.999979
Features (spatial representation):
relative_error : 0.000409
absolute_error : 0.002329
max_diff : 0.100165
cosine_similarity : 1.000134
Accuracy check: PASSED (threshold: 1%)精度验证结果已保存至 output/accuracy.txt。
| 文件 | 说明 |
|---|---|
inference.py | NPU 推理脚本 |
benchmark.py | 性能评测脚本 |
accuracy.py | 精度验证脚本 |
output/ | 运行产物与日志 |
README.md | 适配文档 |
trust_remote_code=True,因为 RADIO-L 使用了自定义的 Hugging Face 模型类。model.to("npu") 直接迁移到 NPU,无需 monkey-patch 修改第三方库代码。patch_size(16)的整数倍,模型内部已通过 get_nearest_supported_resolution 自动处理。