DeBERTa-v3-small-tasksource-nli 模型卡片

基于 DeBERTa-v3-small，将上下文长度调整为 1680 tokens，并在 tasksource 上微调了 250k 步。我对长文本 NLI 任务（ConTRoL、doc-nli）进行了过采样。训练数据包括 HelpSteer v1/v2、逻辑推理任务（FOLIO、FOL-nli、LogicNLI 等）、OASST、hh/rlhf、面向语言学的 NLI 任务、tasksource-dpo 以及事实核查任务。

该模型适用于长上下文 NLI，或作为奖励模型或分类器微调的基础模型。

此检查点在许多任务上具有很强的零样本验证性能（例如在 WNLI 上达到 70%），可用于：

针对任意标签的零样本基于蕴含的分类 [ZS]。
自然语言推理 [NLI]
在新任务或 tasksource 任务（分类、 token 分类或多项选择）上进一步微调 [FT]。

test_name	accuracy
anli/a1	57.2
anli/a2	46.1
anli/a3	47.2
nli_fever	71.7
FOLIO	47.1
ConTRoL-nli	52.2
cladder	52.8
zero-shot-label-nli	70.0
chatbot_arena_conversations	67.8
oasst2_pairwise_rlhf_reward	75.6
doc-nli	75.0

零样本 GPT-4 在 FOLIO（逻辑推理）上的得分是 61%，在 cladder（概率推理）上是 62%，在 ConTRoL（长上下文 NLI）上是 56.4%。

[ZS] 零样本分类流程

from openmind import AutoModelForCausalLM, AutoTokenizer
from openmind import is_torch_npu_available, pipeline
import torch
import argparse
import torch.nn.functional as F
import time

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--model_name_or_path",
        "-m",
        type=str,
        help="Path to model",
        default="zhouhui/deberta-small-long-nli",
    )
    args = parser.parse_args()
    return args

def main():
    args = parse_args()
    model_path = args.model_name_or_path

    if is_torch_npu_available():
        device = "npu:0"
    else:
        device = "cpu"  
    #device = "cpu"  
    
    from openmind import pipeline
    classifier = pipeline("zero-shot-classification",
                          model=model_path, device=device, use_fast=True, multi_label=True,trust_remote_code=True)
    # we will classify the following wikipedia entry about Sardinia"
    
    start_time = time.time()
    text = "one day I will see the world"
    candidate_labels = ['travel', 'cooking', 'dancing']
    res=classifier(text, candidate_labels)
    
    print(f"生成结果： {res}")
    end_time = time.time()
    print(f"硬件环境：{device},推理执行时间：{end_time - start_time}秒")


if __name__ == "__main__":
    main()

该模型的NLI训练数据包含label-nli，这是一个专门为提升此类零样本分类效果而构建的NLI数据集。

[NLI] 自然语言推理流水线

from transformers import pipeline
pipe = pipeline("text-classification",model="tasksource/deberta-small-long-nli")
pipe([dict(text='there is a cat',
  text_pair='there is a black cat')]) #list of (premise,hypothesis)
# [{'label': 'neutral', 'score': 0.9952911138534546}]

[FT] Tasknet：3行代码微调

# !pip install tasknet
import tasknet as tn
hparams=dict(model_name='tasksource/deberta-small-long-nli', learning_rate=2e-5)
model, trainer = tn.Model_Trainer([tn.AutoTask("glue/rte")], hparams)
trainer.train()

软件与训练详情

该模型在 600 个任务上进行了 250k 步的训练，批处理大小为 384，峰值学习率为 2e-5。训练在 Nvidia A30 24GB GPU 上进行，耗时 14 天。

这是在顶部带有 MNLI 分类器的共享模型。每个任务都有特定的 CLS 嵌入，该嵌入有 10% 的概率被丢弃，以便在没有它的情况下也能使用模型。所有多项选择模型均使用相同的分类层。对于分类任务，如果模型的标签匹配，则它们共享权重。

https://github.com/sileod/tasksource/
https://github.com/sileod/tasknet/
训练代码：https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing

引用

有关此模型的更多详细信息，请参见论文：

@inproceedings{sileo-2024-tasksource,
    title = "tasksource: A Large Collection of {NLP} tasks with a Structured Dataset Preprocessing Framework",
    author = "Sileo, Damien",
    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
    month = may,
    year = "2024",
    address = "Torino, Italia",
    publisher = "ELRA and ICCL",
    url = "https://aclanthology.org/2024.lrec-main.1361",
    pages = "15655--15684",
}

模型卡片联系方式

damien.sileo@inria.fr