GPT-neo-1.3B:可用于从提示文本生成内容，是基于EleutherAI GPT-3架构设计的transformer模型，支持NPU，在文本生成任务上表现良好，训练数据为Pile数据集，适合下游文本生成相关应用。【此简介由AI生成】

模型描述

GPT-Neo 1.3B 是一个基于 EleutherAI 对 GPT-3 架构复现而设计的 transformer 模型。GPT-Neo 指的是这类模型，而 1.3B 表示该特定预训练模型的参数数量。

修改

修改示例并添加 NPU 支持
添加依赖项

训练数据

GPT-Neo 1.3B 在 Pile 数据集上进行训练，Pile 是由 EleutherAI 为训练此模型而创建的大规模精选数据集。

训练过程

该模型在 Pile 数据集上训练了 3800 亿个 token，共 362,000 步。它被训练为一个掩码自回归语言模型，使用交叉熵损失函数。

预期用途和局限性

通过这种方式，模型学习英语的内部表示，然后可用于提取对下游任务有用的特征。然而，该模型最擅长的还是其预训练的目标，即根据提示生成文本。

依赖项

transformers==4.44.2
psutil==6.0.0
better_profanity==0.7.0
einops==0.6.1
protobuf==5.28.2

使用方法

您可以直接通过文本生成管道使用此模型。此示例每次运行时都会生成不同的序列（代码使用的关键部分）：

import argparse
import torch
from openmind import pipeline, is_torch_npu_available
from openmind_hub import snapshot_download

if is_torch_npu_available():
    device = "npu:0"
else:
    device = "cpu"
generator = pipeline('text-generation', model="SY_AICC/GPT-neo-1.3B", device=device)
output = generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)
print(f">>>output={output}", flush=True)

评估结果

语言推理

模型及规模	Pile BPB	Pile PPL	Wikitext PPL	Lambada PPL	Lambada 准确率	Winogrande	Hellaswag
GPT-Neo 1.3B	0.7527	6.159	13.10	7.498	57.23%	55.01%	38.66%
GPT-2 1.5B	1.0468	-----	17.48	10.634	51.21%	59.40%	40.03%
GPT-Neo 2.7B	0.7165	5.646	11.39	5.626	62.22%	56.50%	42.73%
GPT-3 Ada	0.9631	-----	-----	9.954	51.60%	52.90%	35.93%

物理与科学推理

模型及规模	MathQA	PubMedQA	Piqa
GPT-Neo 1.3B	24.05%	54.40%	71.11%
GPT-2 1.5B	23.64%	58.33%	70.78%
GPT-Neo 2.7B	24.72%	57.54%	72.14%
GPT-3 Ada	24.29%	52.80%	68.88%

BibTeX 条目及引用信息

@software{gpt-neo,
  author       = {Black, Sid and
                  Leo, Gao and
                  Wang, Phil and
                  Leahy, Connor and
                  Biderman, Stella},
  title        = {{GPT-Neo: Large Scale Autoregressive Language 
                   Modeling with Mesh-Tensorflow}},
  month        = mar,
  year         = 2021,
  note         = {{If you use this software, please cite it using 
                   these metadata.}},
  publisher    = {Zenodo},
  version      = {1.0},
  doi          = {10.5281/zenodo.5297715},
  url          = {https://doi.org/10.5281/zenodo.5297715}
}

@article{gao2020pile,
  title={The Pile: An 800GB Dataset of Diverse Text for Language Modeling},
  author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and others},
  journal={arXiv preprint arXiv:2101.00027},
  year={2020}
}