😼 CatPPT

为您介绍“CatPPT”——这款“喵”不可言的替代品，专为替代城里那只守着所有秘密不撒手的“大猫”而生！我们的这位“猫朋友”是通过 Gradient SLERP 方法融合 openchat 和 neuralchat 模型创建而成，随后在 no_robots 数据集上针对聊天任务进行了微调。

这是排行榜上表现最佳的 7B 模型，完全没有评估数据污染的痕迹。

模型日期

rishiraj/CatPPT 的训练时间为 2023 年 12 月 15 日至 17 日。

评估

该模型在 [Open_LLM_Leaderboard] 上取得了以下结果。发布时，CatPPT 是排行榜上排名最高的 7B 聊天模型，且完全没有评估数据污染。

模型	平均分	ARC	HellaSwag	MMLU	TruthfulQA	Winogrande	GSM8K
rishiraj/CatPPT	72.32	68.09	86.69	65.16	61.55	81.61	70.81
Intel/neural-chat-7b-v3-3	69.83	66.89	85.26	63.07	63.01	79.64	61.11
openchat/openchat-3.5-1210	68.89	64.93	84.92	64.62	52.15	80.74	65.96
meta-math/MetaMath-Mistral-7B	65.78	60.67	82.58	61.95	44.89	75.77	68.84
Deci/DeciLM-7B-instruct	63.19	61.01	82.37	60.24	49.75	79.72	46.02
mistralai/Mistral-7B-Instruct-v0.2	65.71	63.14	84.88	60.78	68.26	77.19	40.03
mistralai/Mixtral-8x7B-Instruct-v0.1	72.62	70.22	87.63	71.16	64.58	81.37	60.73
meta-llama/Llama-2-70b-hf	67.87	67.32	87.33	69.83	44.92	83.74	54.06
tiiuae/falcon-180B	67.85	69.45	88.86	70.5	45.47	86.9	45.94

推理流程

以下是如何使用 🤗 Transformers 中的 pipeline() 函数运行模型：

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="rishiraj/CatPPT", torch_dtype=torch.bfloat16, device_map="auto")


messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate"
    },
    {
        "role": "user",
        "content": "How many helicopters can a human eat in one sitting?"
    }
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

训练过程

训练超参数

训练过程中使用了以下超参数：

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 128
total_train_batch_size: 512
optimizer: Adam，参数 betas=(0.9,0.999)，epsilon=1e-08
lr_scheduler_type: cosine
num_epochs: 1

训练结果

训练损失	轮次	步数	验证损失
1.9947	0.16	3	2.0093

😼 CatPPT

这是排行榜上表现最佳的 7B 模型，完全没有评估数据污染的痕迹。

评估

该模型在 [Open_LLM_Leaderboard] 上取得了以下结果。发布时，CatPPT 是排行榜上排名最高的 7B 聊天模型，且完全没有评估数据污染。

模型	平均分	ARC	HellaSwag	MMLU	TruthfulQA	Winogrande	GSM8K
rishiraj/CatPPT	72.32	68.09	86.69	65.16	61.55	81.61	70.81
Intel/neural-chat-7b-v3-3	69.83	66.89	85.26	63.07	63.01	79.64	61.11
openchat/openchat-3.5-1210	68.89	64.93	84.92	64.62	52.15	80.74	65.96
meta-math/MetaMath-Mistral-7B	65.78	60.67	82.58	61.95	44.89	75.77	68.84
Deci/DeciLM-7B-instruct	63.19	61.01	82.37	60.24	49.75	79.72	46.02
mistralai/Mistral-7B-Instruct-v0.2	65.71	63.14	84.88	60.78	68.26	77.19	40.03
mistralai/Mixtral-8x7B-Instruct-v0.1	72.62	70.22	87.63	71.16	64.58	81.37	60.73
meta-llama/Llama-2-70b-hf	67.87	67.32	87.33	69.83	44.92	83.74	54.06
tiiuae/falcon-180B	67.85	69.45	88.86	70.5	45.47	86.9	45.94

推理流程

以下是如何使用 🤗 Transformers 中的 pipeline() 函数运行模型：

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="rishiraj/CatPPT", torch_dtype=torch.bfloat16, device_map="auto")


messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate"
    },
    {
        "role": "user",
        "content": "How many helicopters can a human eat in one sitting?"
    }
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

训练过程

训练超参数

训练过程中使用了以下超参数：

learning_rate: 2e-05

train_batch_size: 4

eval_batch_size: 8

seed: 42

distributed_type: multi-GPU

gradient_accumulation_steps: 128

total_train_batch_size: 512

optimizer: Adam，参数 betas=(0.9,0.999)，epsilon=1e-08

lr_scheduler_type: cosine

num_epochs: 1

训练结果

训练损失	轮次	步数	验证损失
1.9947	0.16	3	2.0093