BigBird 是一种基于稀疏注意力机制的 Transformer,它对 BERT 等基于 Transformer 的模型进行了扩展,使其能够处理更长的序列。此外,BigBird 还从理论层面阐释了稀疏模型具备完整 Transformer 的处理能力。
BigBird 在这篇论文中首次提出,并在这个代码库中首次发布。
免责声明:发布 BigBird 的团队未为此模型撰写模型卡片,因此本模型卡片由 Hugging Face 团队编写。
BigBird 采用块稀疏注意力机制替代了常规注意力机制(例如 BERT 的注意力机制),能够以远低于 BERT 的计算成本处理长度达 4096 的序列。它在涉及超长序列的各类任务中,如长文档摘要、长上下文问答等,均取得了最先进的性能。
以下是在 PyTorch 中使用此模型获取给定文本特征的方法:
from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-pubmed")
# by default encoder-attention is `block_sparse` with num_random_blocks=3, block_size=64
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed")
# decoder attention type can't be changed & will be "original_full"
# you can change `attention_type` (encoder only) to full attention like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", attention_type="original_full")
# you can change `block_size` & `num_random_blocks` like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", block_size=16, num_random_blocks=2)
text = "Replace me by any text you'd like."
inputs = tokenizer(text, return_tensors='pt')
prediction = model.generate(**inputs)
prediction = tokenizer.batch_decode(prediction)此检查点是通过在 scientific_papers 的 pubmed 数据集上对 BigBirdPegasusForConditionalGeneration 进行摘要任务微调后获得的。
@misc{zaheer2021big,
title={Big Bird: Transformers for Longer Sequences},
author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
year={2021},
eprint={2007.14062},
archivePrefix={arXiv},
primaryClass={cs.LG}
}