Yanguan/RADIO-L
模型介绍文件和版本Pull Requests讨论分析
下载使用量0

RADIO-L 在昇腾 NPU 上的部署

1. 简介

本文档记录 nvidia/RADIO-L 在 昇腾 NPU 上的快速适配与验证结果。RADIO-L 是 NVIDIA Research 提出的视觉基础模型(Vision Foundation Model),为 RADIO 系列中的 Large 版本,用于生成图像的全局摘要特征(summary)和空间特征(features)。

模型返回一个元组:

  • summary: 全局图像表征,shape 为 (B, C)
  • features: 空间特征,shape 为 (B, T, D),可用于密集预测任务或接入 LLM

相关获取地址:

  • 权重下载地址(AtomGit):https://ai.gitcode.com/hf_mirrors/nvidia/RADIO-L
  • 权重下载地址(HuggingFace):https://huggingface.co/nvidia/RADIO-L
  • 原项目 GitHub:https://github.com/NVlabs/RADIO

2. 验证环境

组件版本
torch2.9.0+cpu
torch-npu2.9.0.post1+gitee7ba04
transformers4.40.1
timm1.0.27
Pillowlatest
einopslatest
  • NPU:2 逻辑卡(Ascend 910)
  • 模型路径:/opt/atomgit/weight/RADIO-L
  • CANN 版本:8.5.1

3. 权重下载

方式 1:从 AtomGit 下载(推荐)

python3 -m atomgit download hf_mirrors/nvidia/RADIO-L -d /opt/atomgit/weight/RADIO-L

方式 2:从 ModelScope 下载

modelscope download --model nvidia/RADIO-L --local_dir /opt/atomgit/weight/RADIO-L

4. 环境依赖安装

pip install torch transformers timm einops Pillow

注:本环境已预装 torch-npu,无需额外安装。

5. 推理验证

运行推理脚本:

python3 inference.py assets/demo.png

推理输出示例

============================================================
RADIO-L NPU Inference
============================================================

[1/4] Loading image processor...
[2/4] Loading model from local weights...
[3/4] Moving model to NPU...
[4/4] Loading and preprocessing image...

Input image size: (1182, 718)
Pixel values shape: torch.Size([1, 3, 768, 1264])
Pixel values dtype: torch.float32
Pixel values device: npu:0

Running warmup...
Running inference...

============================================================
Results
============================================================
Summary shape:        torch.Size([1, 3072])
Summary dtype:        torch.float32
Summary device:       npu:0
Summary mean:         -0.004366
Summary std:          0.292162
Summary min:          -2.135583
Summary max:          1.106211

Features shape:       torch.Size([1, 3792, 1024])
Features dtype:       torch.float32
Features device:      npu:0
Features mean:        0.003050
Features std:         0.503550

Inference time:       95.02 ms

推理产物已保存至 output/summary.pt 和 output/features.pt。

6. 性能评测

运行性能评测脚本:

python3 benchmark.py assets/demo.png

基准测试输出示例

============================================================
RADIO-L NPU Benchmark
============================================================

Batch size: 1
Input shape: torch.Size([1, 3, 768, 1264])
Device: npu:0
Warmup 10 iterations...
Benchmarking 50 iterations...

============================================================
Benchmark Results
============================================================
Mean latency:    92.96 ms
Median latency:  92.83 ms
Min latency:     91.98 ms
Max latency:     95.24 ms
P99 latency:     94.92 ms
Std dev:         0.62 ms
Throughput:      10.76 samples/sec

评测结果已保存至 output/benchmark.txt。

7. 精度验证

运行精度验证脚本:

python3 accuracy.py assets/demo.png

准确性 输出示例

============================================================
RADIO-L NPU Accuracy Validation
============================================================

[1/3] Running CPU inference (reference)...
CPU summary shape:   torch.Size([1, 3072])
CPU features shape:  torch.Size([1, 3792, 1024])

[2/3] Running NPU inference...
NPU summary shape:   torch.Size([1, 3072])
NPU features shape:  torch.Size([1, 3792, 1024])

[3/3] Computing accuracy metrics...

============================================================
Accuracy Results
============================================================

Summary (global representation):
  relative_error      : 0.000715
  absolute_error      : 0.001531
  max_diff            : 0.010228
  cosine_similarity   : 0.999979

Features (spatial representation):
  relative_error      : 0.000409
  absolute_error      : 0.002329
  max_diff            : 0.100165
  cosine_similarity   : 1.000134

Accuracy check: PASSED (threshold: 1%)

精度验证结果已保存至 output/accuracy.txt。

8. 交付件清单

文件说明
inference.pyNPU 推理脚本
benchmark.py性能评测脚本
accuracy.py精度验证脚本
output/运行产物与日志
README.md适配文档

9. 注意事项

  1. 模型加载需设置 trust_remote_code=True,因为 RADIO-L 使用了自定义的 Hugging Face 模型类。
  2. 本适配通过 model.to("npu") 直接迁移到 NPU,无需 monkey-patch 修改第三方库代码。
  3. 精度验证以 CPU 为参考基准,NPU 与 CPU 的相对误差小于 1%,满足精度要求。
  4. 输入图像尺寸需为 patch_size(16)的整数倍,模型内部已通过 get_nearest_supported_resolution 自动处理。