本文档记录 nvidia/RADIO-B 在 昇腾 NPU 上的快速适配与验证结果。RADIO-B 是 NVIDIA Research 提出的视觉基础模型(Vision Foundation Model),用于生成图像的全局摘要特征(summary)和空间特征(features)。
模型返回一个元组:
summary: 全局图像表征,shape 为 (B, C)features: 空间特征,shape 为 (B, T, D),可用于密集预测任务或接入 LLM相关获取地址:
| 组件 | 版本 |
|---|---|
torch | 2.9.0+cpu |
torch-npu | 2.9.0.post1+gitee7ba04 |
transformers | 4.40.1 |
timm | 1.0.27 |
Pillow | latest |
einops | latest |
2 逻辑卡(Ascend 910)/opt/atomgit/weight/RADIO-B8.5.1python3 -m atomgit download hf_mirrors/nvidia/RADIO-B -d /opt/atomgit/weight/RADIO-Bmodelscope download --model nvidia/RADIO-B --local_dir /opt/atomgit/weight/RADIO-Bpip install torch transformers timm einops Pillow注:本环境已预装
torch-npu,无需额外安装。
运行推理脚本:
python3 inference.py assets/demo.png============================================================
RADIO-B NPU Inference
============================================================
[1/4] Loading image processor...
[2/4] Loading model from local weights...
[3/4] Moving model to NPU...
[4/4] Loading and preprocessing image...
Input image size: (1182, 718)
Pixel values shape: torch.Size([1, 3, 768, 1264])
Pixel values dtype: torch.float32
Pixel values device: npu:0
Running warmup...
Running inference...
============================================================
Results
============================================================
Summary shape: torch.Size([1, 2304])
Summary dtype: torch.float32
Summary device: npu:0
Summary mean: -0.008162
Summary std: 0.277743
Summary min: -1.517284
Summary max: 1.460084
Features shape: torch.Size([1, 3792, 768])
Features dtype: torch.float32
Features device: npu:0
Features mean: 0.000119
Features std: 0.367595
Inference time: 34.80 ms推理产物已保存至 output/summary.pt 和 output/features.pt。
运行性能评测脚本:
python3 benchmark.py assets/demo.png============================================================
RADIO-B NPU Benchmark
============================================================
Batch size: 1
Input shape: torch.Size([1, 3, 768, 1264])
Device: npu:0
Warmup 10 iterations...
Benchmarking 50 iterations...
============================================================
Benchmark Results
============================================================
Mean latency: 32.77 ms
Median latency: 33.05 ms
Min latency: 32.22 ms
Max latency: 33.29 ms
P99 latency: 33.25 ms
Std dev: 0.46 ms
Throughput: 30.52 samples/sec评测结果已保存至 output/benchmark.txt。
运行精度验证脚本:
python3 accuracy.py assets/demo.png============================================================
RADIO-B NPU Accuracy Validation
============================================================
[1/3] Running CPU inference (reference)...
CPU summary shape: torch.Size([1, 2304])
CPU features shape: torch.Size([1, 3792, 768])
[2/3] Running NPU inference...
NPU summary shape: torch.Size([1, 2304])
NPU features shape: torch.Size([1, 3792, 768])
[3/3] Computing accuracy metrics...
============================================================
Accuracy Results
============================================================
Summary (global representation):
relative_error : 0.001247
absolute_error : 0.001903
max_diff : 0.009087
cosine_similarity : 0.999964
Features (spatial representation):
relative_error : 0.000816
absolute_error : 0.002178
max_diff : 0.027094
cosine_similarity : 1.000078
Accuracy check: PASSED (threshold: 1%)精度验证结果已保存至 output/accuracy.txt。
| 文件 | 说明 |
|---|---|
inference.py | NPU 推理脚本 |
benchmark.py | 性能评测脚本 |
accuracy.py | 精度验证脚本 |
output/ | 运行产物与日志 |
README.md | 适配文档 |
trust_remote_code=True,因为 RADIO-B 使用了自定义的 Hugging Face 模型类。model.to("npu") 直接迁移到 NPU,无需 monkey-patch 修改第三方库代码。patch_size(16)的整数倍,模型内部已通过 get_nearest_supported_resolution 自动处理。