FLAN-T5 base 模型卡片

架构示意图

摘要

如果您已了解 T5 模型，那么 FLAN-T5 在各方面都更为出色。在参数量相同的情况下，这些模型额外经过 1000 多个涵盖多语言任务的精细调优。正如摘要开篇所述：

Flan-PaLM 540B 在多个基准测试中实现了最先进的性能，例如在五样本 MMLU 上达到 75.2% 的准确率。我们同时开源了 Flan-T5 检查点，其小样本性能即使与 PaLM 62B 等更大模型相比也表现强劲。总体而言，指令微调是提升预训练语言模型性能和可用性的通用方法。

免责声明：本模型卡片内容由 Hugging Face 团队撰写，部分内容复制自 T5 模型卡片。

模型详情

模型描述

模型类型： 语言模型
支持语言（NLP）： 英语、西班牙语、日语、波斯语、印地语、法语、中文、孟加拉语、古吉拉特语、德语、泰卢固语、意大利语、阿拉伯语、波兰语、泰米尔语、马拉地语、马拉雅拉姆语、奥里亚语、旁遮普语、葡萄牙语、乌尔都语、加利西亚语、希伯来语、韩语、加泰罗尼亚语、泰语、荷兰语、印尼语、越南语、保加利亚语、菲律宾语、中央高棉语、老挝语、土耳其语、俄语、克罗地亚语、瑞典语、约鲁巴语、库尔德语、缅甸语、马来语、捷克语、芬兰语、索马里语、他加禄语、斯瓦希里语、僧伽罗语、卡纳达语、壮语、伊博语、科萨语、罗马尼亚语、海地语、爱沙尼亚语、斯洛伐克语、立陶宛语、希腊语、尼泊尔语、阿萨姆语、挪威语
许可协议： Apache 2.0
关联模型： 所有 FLAN-T5 检查点
原始检查点： 所有原始 FLAN-T5 检查点
扩展阅读资源：

使用方法

以下是在 transformers 中使用该模型的一些示例脚本：

使用 PyTorch 模型

在 CPU 上运行模型

点击展开


from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base")

input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

在GPU上运行模型

点击展开

# pip install accelerate
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base", device_map="auto")

input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

在GPU上以不同精度运行模型

FP16

点击展开

# pip install accelerate
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base", device_map="auto", torch_dtype=torch.float16)

input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

INT8

点击展开

# pip install bitsandbytes accelerate
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base", device_map="auto", load_in_8bit=True)

input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

用途

直接使用与下游应用

作者在原论文模型卡片中写道：

主要用途是语言模型的相关研究，包括：零样本自然语言处理任务和上下文少样本学习自然语言处理任务的研究，例如推理和问答；推动公平性与安全性研究，以及理解当前大语言模型的局限性

更多细节请参阅研究论文。

超范围使用

需要更多信息。

偏见、风险与局限性

本节以下信息复制自该模型的官方模型卡片：

根据Rae等人（2021）的研究，语言模型（包括Flan-T5）可能被有害地用于文本生成。在任何应用中使用Flan-T5前，都必须先针对具体应用场景进行安全性和公平性评估。

伦理考量与风险

Flan-T5基于未经显式内容过滤或现有偏见评估的大规模文本语料进行微调。因此，该模型可能容易生成同等不当的内容或复制底层数据中固有的偏见。

已知局限性

Flan-T5尚未在现实世界应用中进行测试。

敏感用途：

Flan-T5不得用于任何不可接受的用例，例如生成侮辱性言论。

训练详情

训练数据

该模型在混合任务上进行训练，包括下表所述任务（源自原论文图2）：

训练流程

根据原论文的模型卡片说明：

这些模型基于预训练的T5（Raffel等人，2020），并通过指令微调以获得更好的零样本和少样本性能。每种T5模型尺寸都对应一个微调后的Flan模型。

该模型使用t5x代码库与jax在TPU v3或TPU v4计算集群上完成训练。

评估

测试数据、因素与指标

研究团队在涵盖多种语言（总计1836种）的各项任务上对该模型进行了评估。以下表格展示部分量化评估结果：完整细节请查阅研究论文。

结果

有关FLAN-T5-Base的完整结果，请参见研究论文中的表3。

环境影响

碳排放量可通过Lacoste等人（2019）提出的机器学习影响计算器进行估算。

硬件类型： Google Cloud TPU Pods - TPU v3 或 TPU v4 | 芯片数量 ≥ 4
使用时长： 需补充信息
云服务商： GCP
计算区域： 需补充信息
碳排放量： 需补充信息

引用文献

BibTeX格式：

@misc{https://doi.org/10.48550/arxiv.2210.11416,
  doi = {10.48550/ARXIV.2210.11416},
  
  url = {https://arxiv.org/abs/2210.11416},
  
  author = {Chung, Hyung Won and Hou, Le and Longpre, Shayne and Zoph, Barret and Tay, Yi and Fedus, William and Li, Eric and Wang, Xuezhi and Dehghani, Mostafa and Brahma, Siddhartha and Webson, Albert and Gu, Shixiang Shane and Dai, Zhuyun and Suzgun, Mirac and Chen, Xinyun and Chowdhery, Aakanksha and Narang, Sharan and Mishra, Gaurav and Yu, Adams and Zhao, Vincent and Huang, Yanping and Dai, Andrew and Yu, Hongkun and Petrov, Slav and Chi, Ed H. and Dean, Jeff and Devlin, Jacob and Roberts, Adam and Zhou, Denny and Le, Quoc V. and Wei, Jason},
  
  keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {Scaling Instruction-Finetuned Language Models},
  
  publisher = {arXiv},
  
  year = {2022},
  
  copyright = {Creative Commons Attribution 4.0 International}
}

模型循环利用

以 google/flan-t5-base 作为基础模型，在36个数据集上的评估显示，其平均得分为77.98，而 google/t5-v1_1-base 的得分为68.82。

截至2023年6月2日，该模型在 google/t5-v1_1-base 架构的所有测试模型中排名第一结果：

20_newsgroup	ag_news	amazon_reviews_multi	anli	boolq	cb	cola	copa	dbpedia	esnli	financial_phrasebank	imdb	isear	mnli	mrpc	multirc	poem_sentiment	qnli	qqp	rotten_tomatoes	rte	sst2	sst_5bins	stsb	trec_coarse	trec_fine	tweet_ev_emoji	tweet_ev_emotion	tweet_ev_hate	tweet_ev_irony	tweet_ev_offensive	tweet_ev_sentiment	wic	wnli	wsc	yahoo_answers
86.2188	89.6667	67.12	51.9688	82.3242	78.5714	80.1534	75	77.6667	90.9507	85.4	93.324	72.425	87.2457	89.4608	62.3762	82.6923	92.7878	89.7724	89.0244	84.8375	94.3807	57.2851	89.4759	97.2	92.8	46.848	80.2252	54.9832	76.6582	84.3023	70.6366	70.0627	56.338	53.8462	73.4

更多信息请参见：模型循环利用

FLAN-T5 base 模型卡片

架构示意图

摘要

Flan-PaLM 540B 在多个基准测试中实现了最先进的性能，例如在五样本 MMLU 上达到 75.2% 的准确率。我们同时开源了 Flan-T5 检查点，其小样本性能即使与 PaLM 62B 等更大模型相比也表现强劲。总体而言，指令微调是提升预训练语言模型性能和可用性的通用方法。

免责声明：本模型卡片内容由 Hugging Face 团队撰写，部分内容复制自 T5 模型卡片。

模型详情

模型描述

模型类型： 语言模型
支持语言（NLP）： 英语、西班牙语、日语、波斯语、印地语、法语、中文、孟加拉语、古吉拉特语、德语、泰卢固语、意大利语、阿拉伯语、波兰语、泰米尔语、马拉地语、马拉雅拉姆语、奥里亚语、旁遮普语、葡萄牙语、乌尔都语、加利西亚语、希伯来语、韩语、加泰罗尼亚语、泰语、荷兰语、印尼语、越南语、保加利亚语、菲律宾语、中央高棉语、老挝语、土耳其语、俄语、克罗地亚语、瑞典语、约鲁巴语、库尔德语、缅甸语、马来语、捷克语、芬兰语、索马里语、他加禄语、斯瓦希里语、僧伽罗语、卡纳达语、壮语、伊博语、科萨语、罗马尼亚语、海地语、爱沙尼亚语、斯洛伐克语、立陶宛语、希腊语、尼泊尔语、阿萨姆语、挪威语
许可协议： Apache 2.0
关联模型： 所有 FLAN-T5 检查点
原始检查点： 所有原始 FLAN-T5 检查点
扩展阅读资源：

使用方法

以下是在 transformers 中使用该模型的一些示例脚本：

使用 PyTorch 模型

在 CPU 上运行模型

点击展开


from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base")

input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

在GPU上运行模型

点击展开

# pip install accelerate
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base", device_map="auto")

input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

在GPU上以不同精度运行模型

FP16

点击展开

# pip install accelerate
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base", device_map="auto", torch_dtype=torch.float16)

input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

INT8

点击展开

# pip install bitsandbytes accelerate
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base", device_map="auto", load_in_8bit=True)

input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

用途

直接使用与下游应用

作者在原论文模型卡片中写道：

主要用途是语言模型的相关研究，包括：零样本自然语言处理任务和上下文少样本学习自然语言处理任务的研究，例如推理和问答；推动公平性与安全性研究，以及理解当前大语言模型的局限性

更多细节请参阅研究论文。

超范围使用

需要更多信息。

偏见、风险与局限性

本节以下信息复制自该模型的官方模型卡片：

根据Rae等人（2021）的研究，语言模型（包括Flan-T5）可能被有害地用于文本生成。在任何应用中使用Flan-T5前，都必须先针对具体应用场景进行安全性和公平性评估。

伦理考量与风险

Flan-T5基于未经显式内容过滤或现有偏见评估的大规模文本语料进行微调。因此，该模型可能容易生成同等不当的内容或复制底层数据中固有的偏见。

已知局限性

Flan-T5尚未在现实世界应用中进行测试。

敏感用途：

Flan-T5不得用于任何不可接受的用例，例如生成侮辱性言论。

训练详情

训练数据

该模型在混合任务上进行训练，包括下表所述任务（源自原论文图2）：

训练流程

根据原论文的模型卡片说明：

这些模型基于预训练的T5（Raffel等人，2020），并通过指令微调以获得更好的零样本和少样本性能。每种T5模型尺寸都对应一个微调后的Flan模型。

该模型使用t5x代码库与jax在TPU v3或TPU v4计算集群上完成训练。

评估

测试数据、因素与指标

研究团队在涵盖多种语言（总计1836种）的各项任务上对该模型进行了评估。以下表格展示部分量化评估结果：完整细节请查阅研究论文。

结果

有关FLAN-T5-Base的完整结果，请参见研究论文中的表3。

环境影响

碳排放量可通过Lacoste等人（2019）提出的机器学习影响计算器进行估算。

硬件类型： Google Cloud TPU Pods - TPU v3 或 TPU v4 | 芯片数量 ≥ 4
使用时长： 需补充信息
云服务商： GCP
计算区域： 需补充信息
碳排放量： 需补充信息

引用文献

BibTeX格式：

@misc{https://doi.org/10.48550/arxiv.2210.11416,
  doi = {10.48550/ARXIV.2210.11416},
  
  url = {https://arxiv.org/abs/2210.11416},
  
  author = {Chung, Hyung Won and Hou, Le and Longpre, Shayne and Zoph, Barret and Tay, Yi and Fedus, William and Li, Eric and Wang, Xuezhi and Dehghani, Mostafa and Brahma, Siddhartha and Webson, Albert and Gu, Shixiang Shane and Dai, Zhuyun and Suzgun, Mirac and Chen, Xinyun and Chowdhery, Aakanksha and Narang, Sharan and Mishra, Gaurav and Yu, Adams and Zhao, Vincent and Huang, Yanping and Dai, Andrew and Yu, Hongkun and Petrov, Slav and Chi, Ed H. and Dean, Jeff and Devlin, Jacob and Roberts, Adam and Zhou, Denny and Le, Quoc V. and Wei, Jason},
  
  keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {Scaling Instruction-Finetuned Language Models},
  
  publisher = {arXiv},
  
  year = {2022},
  
  copyright = {Creative Commons Attribution 4.0 International}
}

模型循环利用

以 google/flan-t5-base 作为基础模型，在36个数据集上的评估显示，其平均得分为77.98，而 google/t5-v1_1-base 的得分为68.82。

截至2023年6月2日，该模型在 google/t5-v1_1-base 架构的所有测试模型中排名第一结果：

20_newsgroup	ag_news	amazon_reviews_multi	anli	boolq	cb	cola	copa	dbpedia	esnli	financial_phrasebank	imdb	isear	mnli	mrpc	multirc	poem_sentiment	qnli	qqp	rotten_tomatoes	rte	sst2	sst_5bins	stsb	trec_coarse	trec_fine	tweet_ev_emoji	tweet_ev_emotion	tweet_ev_hate	tweet_ev_irony	tweet_ev_offensive	tweet_ev_sentiment	wic	wnli	wsc	yahoo_answers
86.2188	89.6667	67.12	51.9688	82.3242	78.5714	80.1534	75	77.6667	90.9507	85.4	93.324	72.425	87.2457	89.4608	62.3762	82.6923	92.7878	89.7724	89.0244	84.8375	94.3807	57.2851	89.4759	97.2	92.8	46.848	80.2252	54.9832	76.6582	84.3023	70.6366	70.0627	56.338	53.8462	73.4

更多信息请参见：模型循环利用

FLAN-T5 base 模型卡片

目录

摘要

模型详情

模型描述

使用方法

使用 PyTorch 模型

在 CPU 上运行模型

在GPU上运行模型

在GPU上以不同精度运行模型

FP16

INT8

用途

直接使用与下游应用

超范围使用

偏见、风险与局限性

伦理考量与风险

已知局限性

敏感用途：

训练详情

训练数据

训练流程

评估

测试数据、因素与指标

结果

环境影响

引用文献

模型循环利用

FLAN-T5 base 模型卡片

目录

摘要

模型详情

模型描述

使用方法

使用 PyTorch 模型

在 CPU 上运行模型

在GPU上运行模型

在GPU上以不同精度运行模型

FP16

INT8

用途

直接使用与下游应用

超范围使用

偏见、风险与局限性

伦理考量与风险

已知局限性

敏感用途：

训练详情

训练数据

训练流程

评估

测试数据、因素与指标

结果

环境影响

引用文献

模型循环利用