ke-t5-base 模型卡片

模型详情

模型描述

文本到文本迁移转换器（T5）的开发者在文章中写道：

> 借助 T5，我们建议将所有自然语言处理任务重构为统一的文本到文本格式，其中输入和输出始终是文本字符串，这与 BERT 风格的模型形成对比，后者只能输出类别标签或输入的一个片段。我们的文本到文本框架使我们能够在任何自然语言处理任务上使用相同的模型、损失函数和超参数。

T5-Base 是拥有 2.2 亿参数的检查点。

- 开发者： Colin Raffel、Noam Shazeer、Adam Roberts、Katherine Lee、Sharan Narang、Michael Matena、Yanqi Zhou、Wei Li、Peter J. Liu。 - 共享方（可选）： 韩国电子技术研究院人工智能研究中心 - 模型类型： 文本生成 - 语言（自然语言处理）： 需要更多信息 - 许可证： 需要更多信息 - 相关模型： - 父模型： T5 - 更多信息资源： - GitHub 仓库 - KE-T5 Github 仓库 - 论文 - 相关论文 - 博客文章

用途

直接用途

开发者在博客文章中写道，该模型：

> 我们的文本到文本框架使我们能够在任何自然语言处理任务上使用相同的模型、损失函数和超参数，包括机器翻译、文档摘要、问答以及分类任务（例如情感分析）。我们甚至可以将 T5 应用于回归任务，通过训练它预测数字的字符串表示而非数字本身。

下游用途 [可选]

需要更多信息

超出范围的用途

不得使用该模型故意为人们制造充满敌意或疏离感的环境。

偏见、风险与局限性

已有大量研究探讨了语言模型的偏见和公平性问题（例如，参见 Sheng et al. (2021) 和 Bender et al. (2021)）。该模型生成的预测可能包含针对受保护群体、身份特征以及敏感社会和职业群体的令人不安且有害的刻板印象。

建议

应让用户（包括直接用户和下游用户）了解该模型的风险、偏见和局限性。如需进一步建议，还需更多信息。

训练详情

训练数据

该模型在 Colossal Clean Crawled Corpus (C4) 上进行预训练。C4 语料库的开发和发布背景与 T5 相同，均出自同一篇研究论文。

该模型在无监督任务（1.）和有监督任务（2.）的多任务混合数据上进行预训练。

训练过程

预处理

需要更多信息

速度、规模、时间

需要更多信息

评估

测试数据、因素与指标

测试数据

开发者在 24 项任务上对模型进行了评估，完整详情请参见研究论文。

因素

需要更多信息

指标

需要更多信息

结果

T5-Base 的完整结果，请参见研究论文中的表 14。

模型检查

需要更多信息

环境影响

可使用 Lacoste et al. (2019) 中提出的机器学习影响计算器来估算碳排放。

- 硬件类型： Google Cloud TPU Pods - 使用时长： 需要更多信息 - 云服务提供商： GCP - 计算区域： 需要更多信息 - 碳排放量： 需要更多信息

技术规格 [可选]

模型架构与目标

需要更多信息

计算基础设施

需要更多信息

硬件

需要更多信息

软件

需要更多信息

引用

BibTeX格式：

@inproceedings{kim-etal-2021-model-cross,
    title = "A Model of Cross-Lingual Knowledge-Grounded Response Generation for Open-Domain Dialogue Systems",
    author = "Kim, San  and
      Jang, Jin Yea  and
      Jung, Minyoung  and
      Shin, Saim",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-emnlp.33",
    doi = "10.18653/v1/2021.findings-emnlp.33",
    pages = "352--365",
    abstract = "Research on open-domain dialogue systems that allow free topics is challenging in the field of natural language processing (NLP). The performance of the dialogue system has been improved recently by the method utilizing dialogue-related knowledge; however, non-English dialogue systems suffer from reproducing the performance of English dialogue systems because securing knowledge in the same language with the dialogue system is relatively difficult. Through experiments with a Korean dialogue system, this paper proves that the performance of a non-English dialogue system can be improved by utilizing English knowledge, highlighting the system uses cross-lingual knowledge. For the experiments, we 1) constructed a Korean version of the Wizard of Wikipedia dataset, 2) built Korean-English T5 (KE-T5), a language model pre-trained with Korean and English corpus, and 3) developed a knowledge-grounded Korean dialogue model based on KE-T5. We observed the performance improvement in the open-domain Korean dialogue model even only English knowledge was given. The experimental results showed that the knowledge inherent in cross-lingual language models can be helpful for generating responses in open dialogue systems.",
}

@article{2020t5,
  author  = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
  title   = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {140},
  pages   = {1-67},
  url     = {http://jmlr.org/papers/v21/20-074.html}
}

APA格式：

- Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67.

术语表 [可选]

需要更多信息

模型卡片作者 [可选]

韩国电子技术研究院人工智能研究中心与 Ezi Ozoani 及 Hugging Face 团队合作

模型卡片联系方式

需要更多信息

如何开始使用模型

使用以下代码开始使用模型。

from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch
import torch_npu

device = torch.device('npu:0')
tokenizer = T5Tokenizer.from_pretrained("./")
model = T5ForConditionalGeneration.from_pretrained("./", torch_dtype=torch.float16).to(device)

input_ids = tokenizer("translate English to German: The house is wonderful.", return_tensors="pt").input_ids
outputs = model.generate(input_ids.to(device))
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

ke-t5-base 模型卡片

模型详情

模型描述

文本到文本迁移转换器（T5）的开发者在文章中写道：

T5-Base 是拥有 2.2 亿参数的检查点。

用途

直接用途

开发者在博客文章中写道，该模型：

下游用途 [可选]

需要更多信息

超出范围的用途

不得使用该模型故意为人们制造充满敌意或疏离感的环境。

偏见、风险与局限性

建议

应让用户（包括直接用户和下游用户）了解该模型的风险、偏见和局限性。如需进一步建议，还需更多信息。

训练详情

训练数据

该模型在 Colossal Clean Crawled Corpus (C4) 上进行预训练。C4 语料库的开发和发布背景与 T5 相同，均出自同一篇研究论文。

该模型在无监督任务（1.）和有监督任务（2.）的多任务混合数据上进行预训练。

训练过程

预处理

需要更多信息

速度、规模、时间

需要更多信息

评估

测试数据、因素与指标

测试数据

开发者在 24 项任务上对模型进行了评估，完整详情请参见研究论文。

因素

需要更多信息

指标

需要更多信息

结果

T5-Base 的完整结果，请参见研究论文中的表 14。

模型检查

需要更多信息

环境影响

可使用 Lacoste et al. (2019) 中提出的机器学习影响计算器来估算碳排放。

- 硬件类型： Google Cloud TPU Pods - 使用时长： 需要更多信息 - 云服务提供商： GCP - 计算区域： 需要更多信息 - 碳排放量： 需要更多信息

技术规格 [可选]

模型架构与目标

需要更多信息

计算基础设施

需要更多信息

硬件

需要更多信息

软件

需要更多信息

引用

BibTeX格式：

@inproceedings{kim-etal-2021-model-cross,
    title = "A Model of Cross-Lingual Knowledge-Grounded Response Generation for Open-Domain Dialogue Systems",
    author = "Kim, San  and
      Jang, Jin Yea  and
      Jung, Minyoung  and
      Shin, Saim",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-emnlp.33",
    doi = "10.18653/v1/2021.findings-emnlp.33",
    pages = "352--365",
    abstract = "Research on open-domain dialogue systems that allow free topics is challenging in the field of natural language processing (NLP). The performance of the dialogue system has been improved recently by the method utilizing dialogue-related knowledge; however, non-English dialogue systems suffer from reproducing the performance of English dialogue systems because securing knowledge in the same language with the dialogue system is relatively difficult. Through experiments with a Korean dialogue system, this paper proves that the performance of a non-English dialogue system can be improved by utilizing English knowledge, highlighting the system uses cross-lingual knowledge. For the experiments, we 1) constructed a Korean version of the Wizard of Wikipedia dataset, 2) built Korean-English T5 (KE-T5), a language model pre-trained with Korean and English corpus, and 3) developed a knowledge-grounded Korean dialogue model based on KE-T5. We observed the performance improvement in the open-domain Korean dialogue model even only English knowledge was given. The experimental results showed that the knowledge inherent in cross-lingual language models can be helpful for generating responses in open dialogue systems.",
}

@article{2020t5,
  author  = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
  title   = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {140},
  pages   = {1-67},
  url     = {http://jmlr.org/papers/v21/20-074.html}
}

APA格式：

术语表 [可选]

需要更多信息

模型卡片作者 [可选]

韩国电子技术研究院人工智能研究中心与 Ezi Ozoani 及 Hugging Face 团队合作

模型卡片联系方式

需要更多信息

如何开始使用模型

使用以下代码开始使用模型。

from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch
import torch_npu

device = torch.device('npu:0')
tokenizer = T5Tokenizer.from_pretrained("./")
model = T5ForConditionalGeneration.from_pretrained("./", torch_dtype=torch.float16).to(device)

input_ids = tokenizer("translate English to German: The house is wonderful.", return_tensors="pt").input_ids
outputs = model.generate(input_ids.to(device))
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

ke-t5-base 模型卡片

模型详情

模型描述

用途

直接用途

下游用途 [可选]

超出范围的用途

偏见、风险与局限性

建议

训练详情

训练数据

训练过程

预处理

速度、规模、时间

评估

测试数据、因素与指标

测试数据

因素

指标

结果

模型检查

环境影响

技术规格 [可选]

模型架构与目标

计算基础设施

硬件

软件

引用

术语表 [可选]

更多信息 [可选]

模型卡片作者 [可选]

模型卡片联系方式

如何开始使用模型

ke-t5-base 模型卡片

模型详情

模型描述

用途

直接用途

下游用途 [可选]

超出范围的用途

偏见、风险与局限性

建议

训练详情

训练数据

训练过程

预处理

速度、规模、时间

评估

测试数据、因素与指标

测试数据

因素

指标

结果

模型检查

环境影响

技术规格 [可选]

模型架构与目标

计算基础设施

硬件

软件

引用

术语表 [可选]

更多信息 [可选]

模型卡片作者 [可选]

模型卡片联系方式

如何开始使用模型