FLAN-T5 XXL 的分支版本

这是 google/flan-t5-xxl 的一个分支版本，其中实现了自定义的 handler.py，作为在单个 NVIDIA A10G 上通过 inference-endpoints 使用 t5-11b 的示例。

您可以通过一键部署来部署 flan-t5-xxl。由于我们使用的是“量化”版本，因此可以将实例类型切换为 “GPU [medium] · 1x Nvidia A10G”。

createEndpoint

使用方法

from openmind import pipeline
from openmind import is_torch_npu_available
import torch
import argparse
import torch.nn.functional as F
import time

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--model_name_or_path",
        type=str,
        help="Path to model",
        default="zhouhui/flan-t5-xxl-sharded-fp16",
    )
    args = parser.parse_args()
    return args

def main():
    args = parse_args()
    model_path = args.model_name_or_path

    if is_torch_npu_available():
        device = "npu:0"
    else:
        device = "cpu"   
    #device = "cpu"  
    start_time = time.time()
    generator = pipeline('text2text-generation', model= model_path , device=device,trust_remote_code=True)
    output = generator("They were there to enjoy us and they were there to pray for us.", do_sample=True, min_length=50)
    print(f">>>output={output}")
    end_time = time.time()
    print(f"硬件环境：{device},推理执行时间：{end_time - start_time}秒")


if __name__ == "__main__":
    main()

简而言之

如果你已经了解 T5，那么 FLAN-T5 在各方面都更胜一筹。在参数数量相同的情况下，这些模型在 1000 多个额外任务上进行了微调，涵盖的语言也更多。

正如摘要开头几行所述：

Flan-PaLM 540B 在多个基准测试中取得了最先进的性能，例如在五样本 MMLU 上达到 75.2%。我们还公开发布了 Flan-T5 检查点，1 即使与更大的模型（如 PaLM 62B）相比，也能实现强大的少样本性能。总体而言，指令微调是提高预训练语言模型性能和可用性的通用方法。

免责声明：本模型卡片的内容由 Hugging Face 团队撰写，部分内容复制粘贴自 T5 模型卡片。

模型详情

模型描述

模型类型：语言模型
语言（NLP）：英语、西班牙语、日语、波斯语、印地语、法语、中文、孟加拉语、古吉拉特语、德语、泰卢固语、意大利语、阿拉伯语、波兰语、泰米尔语、马拉地语、马拉雅拉姆语、奥里亚语、旁遮普语、葡萄牙语、乌尔都语、加利西亚语、希伯来语、韩语、加泰罗尼亚语、泰语、荷兰语、印尼语、越南语、保加利亚语、菲律宾语、高棉语、老挝语、土耳其语、俄语、克罗地亚语、瑞典语、约鲁巴语、库尔德语、缅甸语、马来语、捷克语、芬兰语、索马里语、他加禄语、斯瓦希里语、僧伽罗语、卡纳达语、壮语、伊博语、科萨语、罗马尼亚语、海地语、爱沙尼亚语、斯洛伐克语、立陶宛语、希腊语、尼泊尔语、阿萨姆语、挪威语
许可证：Apache 2.0
相关模型：所有 FLAN-T5 检查点
原始检查点：所有原始 FLAN-T5 检查点
更多信息资源：

FLAN-T5 XXL 的分支版本

这是 google/flan-t5-xxl 的一个分支版本，其中实现了自定义的 handler.py，作为在单个 NVIDIA A10G 上通过 inference-endpoints 使用 t5-11b 的示例。

您可以通过一键部署来部署 flan-t5-xxl。由于我们使用的是“量化”版本，因此可以将实例类型切换为 “GPU [medium] · 1x Nvidia A10G”。

使用方法

from openmind import pipeline
from openmind import is_torch_npu_available
import torch
import argparse
import torch.nn.functional as F
import time

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--model_name_or_path",
        type=str,
        help="Path to model",
        default="zhouhui/flan-t5-xxl-sharded-fp16",
    )
    args = parser.parse_args()
    return args

def main():
    args = parse_args()
    model_path = args.model_name_or_path

    if is_torch_npu_available():
        device = "npu:0"
    else:
        device = "cpu"   
    #device = "cpu"  
    start_time = time.time()
    generator = pipeline('text2text-generation', model= model_path , device=device,trust_remote_code=True)
    output = generator("They were there to enjoy us and they were there to pray for us.", do_sample=True, min_length=50)
    print(f">>>output={output}")
    end_time = time.time()
    print(f"硬件环境：{device},推理执行时间：{end_time - start_time}秒")


if __name__ == "__main__":
    main()

简而言之

如果你已经了解 T5，那么 FLAN-T5 在各方面都更胜一筹。在参数数量相同的情况下，这些模型在 1000 多个额外任务上进行了微调，涵盖的语言也更多。

正如摘要开头几行所述：

Flan-PaLM 540B 在多个基准测试中取得了最先进的性能，例如在五样本 MMLU 上达到 75.2%。我们还公开发布了 Flan-T5 检查点，1 即使与更大的模型（如 PaLM 62B）相比，也能实现强大的少样本性能。总体而言，指令微调是提高预训练语言模型性能和可用性的通用方法。

免责声明：本模型卡片的内容由 Hugging Face 团队撰写，部分内容复制粘贴自 T5 模型卡片。

模型描述

模型类型：语言模型

语言（NLP）：英语、西班牙语、日语、波斯语、印地语、法语、中文、孟加拉语、古吉拉特语、德语、泰卢固语、意大利语、阿拉伯语、波兰语、泰米尔语、马拉地语、马拉雅拉姆语、奥里亚语、旁遮普语、葡萄牙语、乌尔都语、加利西亚语、希伯来语、韩语、加泰罗尼亚语、泰语、荷兰语、印尼语、越南语、保加利亚语、菲律宾语、高棉语、老挝语、土耳其语、俄语、克罗地亚语、瑞典语、约鲁巴语、库尔德语、缅甸语、马来语、捷克语、芬兰语、索马里语、他加禄语、斯瓦希里语、僧伽罗语、卡纳达语、壮语、伊博语、科萨语、罗马尼亚语、海地语、爱沙尼亚语、斯洛伐克语、立陶宛语、希腊语、尼泊尔语、阿萨姆语、挪威语

许可证：Apache 2.0

相关模型：所有 FLAN-T5 检查点

原始检查点：所有原始 FLAN-T5 检查点

更多信息资源：