模型说明：deberta-v3-large-zeroshot-v2.0

zeroshot-v2.0 系列模型

本系列模型专为在 Hugging Face 流水线中进行高效零样本分类而设计。
这些模型无需训练数据即可完成分类任务，并且可在 GPU 和 CPU 上运行。
最新零样本分类器的概述可在我的零样本分类器集合中找到。

此 zeroshot-v2.0 系列模型的主要更新在于，部分模型是在完全商业友好的数据上训练的，以满足有严格许可要求的用户。

这些模型可以执行一项通用分类任务：根据给定文本判断假设是“正确”还是“不正确”（entailment 对 not_entailment）。
此任务格式基于自然语言推理（NLI）任务。
该任务具有高度通用性，Hugging Face 流水线可将任何分类任务重新表述为此任务形式。

训练数据

名称中带有“-c”的模型在两类完全商业友好的数据上训练：

使用 Mixtral-8x7B-Instruct-v0.1 生成的合成数据。我首先在与 Mistral-large 的对话中，为 25 种职业创建了包含 500 多个多样化文本分类任务的列表。这些数据经过人工筛选。然后，我将此作为种子数据，使用 Mixtral-8x7B-Instruct-v0.1 为这些任务生成了数十万条文本。所使用的最终数据集可在 synthetic_zeroshot_mixtral_v0.1 数据集中的子集 mixtral_written_text_for_tasks_v4 找到。数据筛选经过多轮迭代，未来迭代中还将进一步改进。
两个商业友好的 NLI 数据集：(MNLI、FEVER-NLI)。添加这些数据集是为了增强模型的泛化能力。
名称中不带“-c”的模型还包含更广泛的训练数据组合以及更广泛的许可组合：ANLI、WANLI、LingNLI，以及此列表中所有 used_in_v1.1==True 的数据集。

如何使用模型

#!pip install transformers[sentencepiece]
from transformers import pipeline
text = "Angela Merkel is a politician in Germany and leader of the CDU"
hypothesis_template = "This text is about {}"
classes_verbalized = ["politics", "economy", "entertainment", "environment"]
zeroshot_classifier = pipeline("zero-shot-classification", model="MoritzLaurer/deberta-v3-large-zeroshot-v2.0")  # change the model identifier here
output = zeroshot_classifier(text, classes_verbalized, hypothesis_template=hypothesis_template, multi_label=False)
print(output)

multi_label=False 会强制模型仅判定一个类别。multi_label=True 则允许模型选择多个类别。

评估指标

模型在 28 个不同的文本分类任务上进行了评估，采用 f1_macro 指标。主要参考基准为 facebook/bart-large-mnli，在撰写本文时（2024 年 4 月 3 日），它是商业友好型零样本分类器中使用最为广泛的。

results_aggreg_v2.0

	facebook/bart-large-mnli	roberta-base-zeroshot-v2.0-c	roberta-large-zeroshot-v2.0-c	deberta-v3-base-zeroshot-v2.0-c	deberta-v3-base-zeroshot-v2.0 (fewshot)	deberta-v3-large-zeroshot-v2.0-c	deberta-v3-large-zeroshot-v2.0 (fewshot)	bge-m3-zeroshot-v2.0-c	bge-m3-zeroshot-v2.0 (fewshot)
所有数据集均值	0.497	0.587	0.622	0.619	0.643 (0.834)	0.676	0.673 (0.846)	0.59	(0.803)
amazonpolarity (2)	0.937	0.924	0.951	0.937	0.943 (0.961)	0.952	0.956 (0.968)	0.942	(0.951)
imdb (2)	0.892	0.871	0.904	0.893	0.899 (0.936)	0.923	0.918 (0.958)	0.873	(0.917)
appreviews (2)	0.934	0.913	0.937	0.938	0.945 (0.948)	0.943	0.949 (0.962)	0.932	(0.954)
yelpreviews (2)	0.948	0.953	0.977	0.979	0.975 (0.989)	0.988	0.985 (0.994)	0.973	(0.978)
rottentomatoes (2)	0.83	0.802	0.841	0.84	0.86 (0.902)	0.869	0.868 (0.908)	0.813	(0.866)
emotiondair (6)	0.455	0.482	0.486	0.459	0.495 (0.748)	0.499	0.484 (0.688)	0.453	(0.697)
emocontext (4)	0.497	0.555	0.63	0.59	0.592 (0.799)	0.699	0.676 (0.81)	0.61	(0.798)
empathetic (32)	0.371	0.374	0.404	0.378	0.405 (0.53)	0.447	0.478 (0.555)	0.387	(0.455)
financialphrasebank (3)	0.465	0.562	0.455	0.714	0.669 (0.906)	0.691	0.582 (0.913)	0.504	(0.895)
banking77 (72)	0.312	0.124	0.29	0.421	0.446 (0.751)	0.513	0.567 (0.766)	0.387	(0.715)
massive (59)	0.43	0.428	0.543	0.512	0.52 (0.755)	0.526	0.518 (0.789)	0.414	(0.692)
wikitoxic_toxicaggreg (2)	0.547	0.751	0.766	0.751	0.769 (0.904)	0.741	0.787 (0.911)	0.736	(0.9)
wikitoxic_obscene (2)	0.713	0.817	0.854	0.853	0.869 (0.922)	0.883	0.893 (0.933)	0.783	(0.914)
wikitoxic_threat (2)	0.295	0.71	0.817	0.813	0.87 (0.946)	0.827	0.879 (0.952)	0.68	(0.947)
wikitoxic_insult (2)	0.372	0.724	0.798	0.759	0.811 (0.912)	0.77	0.779 (0.924)	0.783	(0.915)
wikitoxic_identityhate (2)	0.473	0.774	0.798	0.774	0.765 (0.938)	0.797	0.806 (0.948)	0.761	(0.931)
hateoffensive (3)	0.161	0.352	0.29	0.315	0.371 (0.862)	0.47	0.461 (0.847)	0.291	(0.823)
hatexplain (3)	0.239	0.396	0.314	0.376	0.369 (0.765)	0.378	0.389 (0.764)	0.29	(0.729)
biasframes_offensive (2)	0.336	0.571	0.583	0.544	0.601 (0.867)	0.644	0.656 (0.883)	0.541	(0.855)
biasframes_sex (2)	0.263	0.617	0.835	0.741	0.809 (0.922)	0.846	0.815 (0.946)	0.748	(0.905)
biasframes_intent (2)	0.616	0.531	0.635	0.554	0.61 (0.881)	0.696	0.687 (0.891)	0.467	(0.868)
agnews (4)	0.703	0.758	0.745	0.68	0.742 (0.898)	0.819	0.771 (0.898)	0.687	(0.892)
yahootopics (10)	0.299	0.543	0.62	0.578	0.564 (0.722)	0.621	0.613 (0.738)	0.587	(0.711)
trueteacher (2)	0.491	0.469	0.402	0.431	0.479 (0.82)	0.459	0.538 (0.846)	0.471	(0.518)
spam (2)	0.505	0.528	0.504	0.507	0.464 (0.973)	0.74	0.597 (0.983)	0.441	(0.978)
wellformedquery (2)	0.407	0.333	0.333	0.335	0.491 (0.769)	0.334	0.429 (0.815)	0.361	(0.718)
manifesto (56)	0.084	0.102	0.182	0.17	0.187 (0.376)	0.258	0.256 (0.408)	0.147	(0.331)
capsotu (21)	0.34	0.479	0.523	0.502	0.477 (0.664)	0.603	0.502 (0.686)	0.472	(0.644)

这些数字代表零样本性能，因为训练数据中未包含这些数据集的数据。请注意，标题中不含 "-c" 的模型进行了两次评估：一次运行未使用这 28 个数据集的任何数据，以测试纯零样本性能（对应列中的第一个数字）；最后一次运行则包含每个数据集每个类别最多 500 个训练数据点（对应列中括号内的第二个数字，即“少样本”）。所有模型均未在测试数据上进行训练。

不同数据集的详细信息可在此处获取：https://github.com/MoritzLaurer/zeroshot-classifier/blob/main/v1_human_data/datasets_overview.csv

何时使用何种模型

deberta-v3-zeroshot 与 roberta-zeroshot：deberta-v3 的性能明显优于 roberta，但速度稍慢。roberta 直接兼容 Hugging Face 的生产推理 TEI 容器和 flash attention。这些容器是生产用例的理想选择。简而言之：若追求准确性，使用 deberta-v3 模型。如果生产推理速度是关注点，可考虑 roberta 模型（例如在 TEI 容器和 HF Inference Endpoints 中使用）。
商业用例：标题中带有“-c”的模型保证仅使用商业友好型数据进行训练。不带“-c”的模型训练数据更多，性能更优，但包含具有非商业许可的数据。对于训练数据是否会影响训练后模型的许可，法律意见存在分歧。对于有严格法律要求的用户，建议使用标题中带有“-c”的模型。
多语言/非英语用例：使用 bge-m3-zeroshot-v2.0 或 bge-m3-zeroshot-v2.0-c。请注意，多语言模型的性能不如仅支持英语的模型。因此，您也可以先使用 EasyNMT 等库将文本机器翻译为英语，然后对翻译后的数据应用任何仅支持英语的模型。如果您的团队并非通晓数据中的所有语言，机器翻译还有助于进行验证。
上下文窗口：bge-m3 模型最多可处理 8192 个 token。其他模型最多可处理 512 个 token。请注意，较长的文本输入会降低模型速度并影响性能，因此如果您处理的文本长度最多为 400 词左右/1 页，建议使用如 deberta 模型以获得更好的性能。
有关新模型的最新更新，可随时在 Zeroshot Classifier Collection 中查看。

复现

复现代码可在以下链接的 v2_synthetic_data 目录中获取：https://github.com/MoritzLaurer/zeroshot-classifier/tree/main

局限性与偏差

该模型仅能执行文本分类任务。

偏差可能来源于底层基础模型、人工 NLI 训练数据以及由 Mixtral 生成的合成数据。

许可证

基础模型以 MIT 许可证发布。训练数据的许可证因模型而异，详见上文。

引用

本模型是对本论文中所述研究的扩展。

如果您在学术研究中使用本模型，请引用：

@misc{laurer_building_2023,
	title = {Building {Efficient} {Universal} {Classifiers} with {Natural} {Language} {Inference}},
	url = {http://arxiv.org/abs/2312.17543},
	doi = {10.48550/arXiv.2312.17543},
	abstract = {Generative Large Language Models (LLMs) have become the mainstream choice for fewshot and zeroshot learning thanks to the universality of text generation. Many users, however, do not need the broad capabilities of generative LLMs when they only want to automate a classification task. Smaller BERT-like models can also learn universal tasks, which allow them to do any text classification task without requiring fine-tuning (zeroshot classification) or to learn new tasks with only a few examples (fewshot), while being significantly more efficient than generative LLMs. This paper (1) explains how Natural Language Inference (NLI) can be used as a universal classification task that follows similar principles as instruction fine-tuning of generative LLMs, (2) provides a step-by-step guide with reusable Jupyter notebooks for building a universal classifier, and (3) shares the resulting universal classifier that is trained on 33 datasets with 389 diverse classes. Parts of the code we share has been used to train our older zeroshot classifiers that have been downloaded more than 55 million times via the Hugging Face Hub as of December 2023. Our new classifier improves zeroshot performance by 9.4\%.},
	urldate = {2024-01-05},
	publisher = {arXiv},
	author = {Laurer, Moritz and van Atteveldt, Wouter and Casas, Andreu and Welbers, Kasper},
	month = dec,
	year = {2023},
	note = {arXiv:2312.17543 [cs]},
	keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language},
}

合作想法或疑问？

如果您有任何问题或合作想法，请通过 moritz{at}huggingface{dot}co 联系我，或访问 LinkedIn。

灵活使用与“提示”

您可以通过更改零样本管道的 hypothesis_template 来构建自己的假设。与大型语言模型（LLM）的“提示工程”类似，您可以测试不同的 hypothesis_template 表述方式和文本化类别，以提升性能。

from transformers import pipeline
text = "Angela Merkel is a politician in Germany and leader of the CDU"
# formulation 1
hypothesis_template = "This text is about {}"
classes_verbalized = ["politics", "economy", "entertainment", "environment"]
# formulation 2 depending on your use-case
hypothesis_template = "The topic of this text is {}"
classes_verbalized = ["political activities", "economic policy", "entertainment or music", "environmental protection"]
# test different formulations
zeroshot_classifier = pipeline("zero-shot-classification", model="MoritzLaurer/deberta-v3-large-zeroshot-v2.0")  # change the model identifier here
output = zeroshot_classifier(text, classes_verbalized, hypothesis_template=hypothesis_template, multi_label=False)
print(output)