HuggingFace镜像/RADIO-L
模型介绍文件和版本分析
下载使用量0

AM-RADIO:万域归一

Mike Ranzinger, Greg Heinrich, Jan Kautz, Pavlo Molchanov

NVIDIA Research

[AM−RADIO论文](https://arxiv.org/abs/2312.06709)[AM-RADIO 论文](https://arxiv.org/abs/2312.06709)[AM−RADIO论文](https://arxiv.org/abs/2312.06709)

[PHI−S论文](https://arxiv.org/abs/2410.01680)[PHI-S 论文](https://arxiv.org/abs/2410.01680)[PHI−S论文](https://arxiv.org/abs/2410.01680)

[BibTex](#citing-radio)

[GitHub示例](https://github.com/NVlabs/RADIO)[GitHub 示例](https://github.com/NVlabs/RADIO)[GitHub示例](https://github.com/NVlabs/RADIO)

[v2.5技术报告](https://github.com/NVlabs/RADIO/blob/main/RADIOv2.5techreport.md)[v2.5 技术报告](https://github.com/NVlabs/RADIO/blob/main/RADIOv2.5_tech_report.md)[v2.5技术报告](https://github.com/NVlabs/RADIO/blob/main/RADIOv2.5t​echr​eport.md)

HuggingFace Hub

您可以通过 Python 脚本拉取模型:

import torch
from PIL import Image
from transformers import AutoModel, CLIPImageProcessor

hf_repo = "nvidia/RADIO-L"

image_processor = CLIPImageProcessor.from_pretrained(hf_repo)
model = AutoModel.from_pretrained(hf_repo, trust_remote_code=True)
model.eval().cuda()

image = Image.open('./assets/radio.png').convert('RGB')
pixel_values = image_processor(images=image, return_tensors='pt', do_resize=True).pixel_values
pixel_values = pixel_values.cuda()

summary, features = model(pixel_values)

使用方法

RADIO-L 将返回包含两个张量的元组。其中 summary 类似于 ViT 中的 cls_token,用于表征图像的整体概念,其维度为 (B,C)(B,C)(B,C)——BBB 代表批处理维度,CCC 表示通道数。而 spatial_features 则表征更具局部性的内容,适用于语义分割等密集任务或与大型语言模型(LLM)的集成,其维度为 (B,T,D)(B,T,D)(B,T,D)——TTT 代表展平后的空间标记数量,DDD 表示空间特征的通道数。请注意,通常 C≠DC \neq DC=D。

若需转换为空间张量格式,可结合模型的下采样尺寸与输入张量形状进行计算。对于 'radio_v1' 版本,其图像块(patch)大小为 14。

from einops import rearrange
spatial_features = rearrange(spatial_features, 'b (h w) d -> b d h w', h=x.shape[-2] // patch_size, w=x.shape[-1] // patch_size)

生成的张量将具有形状(B,D,H,W)(B,D,H,W)(B,D,H,W),这在计算机视觉模型中十分常见。

RADIOv2.5 说明

详见RADIOv2.5技术报告。

许可证

RADIO代码及权重依据NSCLv1许可证发布。

引用RADIO

如果您认为本资源库对您有所帮助,请考虑给予星标支持并引用:

@InProceedings{Ranzinger_2024_CVPR,
    author    = {Ranzinger, Mike and Heinrich, Greg and Kautz, Jan and Molchanov, Pavlo},
    title     = {AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {12490-12500}
}
@misc{ranzinger2024phisdistributionbalancinglabelfree,
      title={PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation}, 
      author={Mike Ranzinger and Jon Barker and Greg Heinrich and Pavlo Molchanov and Bryan Catanzaro and Andrew Tao},
      year={2024},
      eprint={2410.01680},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2410.01680}, 
}