ConvNeXT（大型模型）

ConvNeXT 模型在 ImageNet-1k 数据集上以 224x224 分辨率训练完成。该模型由 Liu 等人在论文《A ConvNet for the 2020s》中提出，并首次发布于此代码库。

免责声明：发布 ConvNeXT 的团队未为此模型编写模型卡片，故本模型卡片由 Hugging Face 团队撰写。

模型描述

ConvNeXT 是一个纯卷积模型（ConvNet），其设计灵感源自 Vision Transformer，并宣称性能优于后者。作者以 ResNet 为基础，借鉴 Swin Transformer 的设计理念，对其进行了“现代化”改造。

模型结构图

适用范围与限制

您可将该原始模型用于图像分类任务。请访问模型中心寻找针对特定任务微调的版本。

使用方法

以下示例展示如何将此模型用于将 COCO 2017 数据集中的图像分类为 1000 个 ImageNet 类别之一：

from transformers import ConvNextImageProcessor, ConvNextForImageClassification
import torch
from datasets import load_dataset

dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]

processor = ConvNextImageProcessor.from_pretrained("facebook/convnext-small-224")
model = ConvNextForImageClassification.from_pretrained("facebook/convnext-small-224")

inputs = processor(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

# model predicts one of the 1000 ImageNet classes
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label]),

如需更多代码示例，请参阅文档。

BibTeX 条目与引用信息

@article{DBLP:journals/corr/abs-2201-03545,
  author    = {Zhuang Liu and
               Hanzi Mao and
               Chao{-}Yuan Wu and
               Christoph Feichtenhofer and
               Trevor Darrell and
               Saining Xie},
  title     = {A ConvNet for the 2020s},
  journal   = {CoRR},
  volume    = {abs/2201.03545},
  year      = {2022},
  url       = {https://arxiv.org/abs/2201.03545},
  eprinttype = {arXiv},
  eprint    = {2201.03545},
  timestamp = {Thu, 20 Jan 2022 14:21:35 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2201-03545.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}