
我们推出了 BLOOMZ 和 mT0 系列模型,这些模型能够在数十种语言中零样本遵循人类指令。我们在跨语言任务混合数据集(xP3)上对 BLOOM 和 mT5 预训练多语言语言模型进行微调,发现得到的模型能够对未见过的任务和语言进行跨语言泛化。
| 在 xP3 上进行多任务微调。推荐用于英文提示词。 | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 参数规模 | 3亿 | 5.8亿 | 12亿 | 37亿 | 130亿 | 5.6亿 | 11亿 | 17亿 | 30亿 | 71亿 | 1760亿 |
| 微调模型 | mt0-small | mt0-base | mt0-large | mt0-xl | mt0-xxl | bloomz-560m | bloomz-1b1 | bloomz-1b7 | bloomz-3b | bloomz-7b1 | bloomz |
| 在 xP3mt 上进行多任务微调。推荐用于非英文提示词。 | |||||||||||
| 微调模型 | mt0-xxl-mt | bloomz-7b1-mt | bloomz-mt | ||||||||
| 在 P3 上进行多任务微调。仅用于研究目的发布。性能严格低于上述模型! | |||||||||||
| 微调模型 | mt0-xxl-p3 | bloomz-7b1-p3 | bloomz-p3 | ||||||||
| 原始预训练检查点。不推荐使用。 | |||||||||||
| 预训练模型 | mt5-small | mt5-base | mt5-large | mt5-xl | mt5-xxl | bloom-560m | bloom-1b1 | bloom-1b7 | bloom-3b | bloom-7b1 | bloom |
我们建议使用该模型来执行以自然语言表达的任务。例如,给定提示“Translate to English: Je t’aime.”,模型很可能会回答“I love you.”。以下是我们论文中的一些提示思路:
欢迎在社区标签中分享您生成的内容!
import torch
import argparse
from openmind import AutoTokenizer, is_torch_npu_available
from transformers import AutoModelForSeq2SeqLM
import time
def parse_args():
parser = argparse.ArgumentParser(description="Eval the model")
parser.add_argument(
"--model_name_or_path",
type=str,
help="path or model",
default="jeffding/mt0-small-openmind",
)
args = parser.parse_args()
return args
def main():
args = parse_args()
model_path = args.model_name_or_path
if is_torch_npu_available():
device = "npu:0"
else:
device = "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSeq2SeqLM.from_pretrained(model_path).to(device)
data = "translate English to German:That is good."
start_time = time.time()
encoded = tokenizer([data], return_tensors="pt").to(device)
translation = model.generate(**encoded)
result = tokenizer.batch_decode(translation, skip_special_tokens=True)[0]
print(result)
end_time = time.time()
print(f"硬件环境:{device},推理执行时间:{end_time - start_time}秒")
if __name__ == "__main__":
main()# pip install -q transformers
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
checkpoint = "bigscience/mt0-small"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))# pip install -q transformers accelerate
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
checkpoint = "bigscience/mt0-small"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint, torch_dtype="auto", device_map="auto")
inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))# pip install -q transformers accelerate bitsandbytes
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
checkpoint = "bigscience/mt0-small"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint, device_map="auto", load_in_8bit=True)
inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))提示词工程: 模型性能可能因提示词而异。对于 BLOOMZ 模型,我们建议明确输入的结束位置,以避免模型尝试续写输入内容。例如,提示词“Translate to English: Je t'aime”末尾没有句号(.),可能会导致模型尝试续写这个法语句子。更优的提示词示例包括:“Translate to English: Je t'aime.”、“Translate to English: Je t'aime. Translation:”、“What is "Je t'aime." in English?”,这些提示词能让模型清楚知道何时应该给出答案。此外,我们建议为模型提供尽可能多的上下文。例如,如果希望模型用泰卢固语回答,可以这样告知模型:“Explain in a sentence in Telugu what is backpropagation in neural networks.”。
config.json 文件关于未见过任务的零样本结果,我们参考论文 paper 中的表 7 以及 bigscience/evaluation-results。侧边栏报告了每个数据集配置下最佳提示词的零样本性能。
@article{muennighoff2022crosslingual,
title={Crosslingual generalization through multitask finetuning},
author={Muennighoff, Niklas and Wang, Thomas and Sutawika, Lintang and Roberts, Adam and Biderman, Stella and Scao, Teven Le and Bari, M Saiful and Shen, Sheng and Yong, Zheng-Xin and Schoelkopf, Hailey and others},
journal={arXiv preprint arXiv:2211.01786},
year={2022}
}