基于 DeBERTa-v3-small,将上下文长度调整为 1680 tokens,并在 tasksource 上微调了 250k 步。我对长文本 NLI 任务(ConTRoL、doc-nli)进行了过采样。 训练数据包括 HelpSteer v1/v2、逻辑推理任务(FOLIO、FOL-nli、LogicNLI 等)、OASST、hh/rlhf、面向语言学的 NLI 任务、tasksource-dpo 以及事实核查任务。
该模型适用于长上下文 NLI,或作为奖励模型或分类器微调的基础模型。
此检查点在许多任务上具有很强的零样本验证性能(例如在 WNLI 上达到 70%),可用于:
| test_name | accuracy |
|---|---|
| anli/a1 | 57.2 |
| anli/a2 | 46.1 |
| anli/a3 | 47.2 |
| nli_fever | 71.7 |
| FOLIO | 47.1 |
| ConTRoL-nli | 52.2 |
| cladder | 52.8 |
| zero-shot-label-nli | 70.0 |
| chatbot_arena_conversations | 67.8 |
| oasst2_pairwise_rlhf_reward | 75.6 |
| doc-nli | 75.0 |
零样本 GPT-4 在 FOLIO(逻辑推理)上的得分是 61%,在 cladder(概率推理)上是 62%,在 ConTRoL(长上下文 NLI)上是 56.4%。
from openmind import AutoModelForCausalLM, AutoTokenizer
from openmind import is_torch_npu_available, pipeline
import torch
import argparse
import torch.nn.functional as F
import time
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
"--model_name_or_path",
"-m",
type=str,
help="Path to model",
default="zhouhui/deberta-small-long-nli",
)
args = parser.parse_args()
return args
def main():
args = parse_args()
model_path = args.model_name_or_path
if is_torch_npu_available():
device = "npu:0"
else:
device = "cpu"
#device = "cpu"
from openmind import pipeline
classifier = pipeline("zero-shot-classification",
model=model_path, device=device, use_fast=True, multi_label=True,trust_remote_code=True)
# we will classify the following wikipedia entry about Sardinia"
start_time = time.time()
text = "one day I will see the world"
candidate_labels = ['travel', 'cooking', 'dancing']
res=classifier(text, candidate_labels)
print(f"生成结果: {res}")
end_time = time.time()
print(f"硬件环境:{device},推理执行时间:{end_time - start_time}秒")
if __name__ == "__main__":
main()该模型的NLI训练数据包含label-nli,这是一个专门为提升此类零样本分类效果而构建的NLI数据集。
from transformers import pipeline
pipe = pipeline("text-classification",model="tasksource/deberta-small-long-nli")
pipe([dict(text='there is a cat',
text_pair='there is a black cat')]) #list of (premise,hypothesis)
# [{'label': 'neutral', 'score': 0.9952911138534546}]# !pip install tasknet
import tasknet as tn
hparams=dict(model_name='tasksource/deberta-small-long-nli', learning_rate=2e-5)
model, trainer = tn.Model_Trainer([tn.AutoTask("glue/rte")], hparams)
trainer.train()该模型在 600 个任务上进行了 250k 步的训练,批处理大小为 384,峰值学习率为 2e-5。训练在 Nvidia A30 24GB GPU 上进行,耗时 14 天。
这是在顶部带有 MNLI 分类器的共享模型。每个任务都有特定的 CLS 嵌入,该嵌入有 10% 的概率被丢弃,以便在没有它的情况下也能使用模型。所有多项选择模型均使用相同的分类层。对于分类任务,如果模型的标签匹配,则它们共享权重。
https://github.com/sileod/tasksource/
https://github.com/sileod/tasknet/
训练代码:https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
有关此模型的更多详细信息,请参见论文:
@inproceedings{sileo-2024-tasksource,
title = "tasksource: A Large Collection of {NLP} tasks with a Structured Dataset Preprocessing Framework",
author = "Sileo, Damien",
booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
month = may,
year = "2024",
address = "Torino, Italia",
publisher = "ELRA and ICCL",
url = "https://aclanthology.org/2024.lrec-main.1361",
pages = "15655--15684",
}