HuggingFace镜像/gliner-model-merge-large-v1.0
模型介绍文件和版本分析
下载使用量0

Illustration

xomad/gliner-model-merge-large-v1.0 模型基于预训练模型 knowledgator/gliner-multitask-large-v0.5 开发,旨在探索模型融合技术的能力。通过该技术,模型性能显著提升了 3.25 个百分点,F1 分数从 0.6276 提升至 0.6601。

为确保在 Apache-2.0 许可下具有广泛的适用性,该模型仅使用商业友好许可的数据集进行训练。训练过程中使用的数据集如下:

  • knowledgator/GLINER-multi-task-synthetic-data
  • EmergentMethods/AskNews-NER-v0
  • urchade/pile-mistral-v0.1
  • MultiCoNER/multiconer_v2
  • DFKI-SLT/few-nerd

⚙️ 微调过程

该过程以基础模型 knowledgator/gliner-multitask-large-v0.5 为起点。我们的模型 xomad/gliner-model-merge-large-v1.0 在上述每个数据集上分别进行微调,并在微调过程中保存多个检查点。我们将所有这些检查点汇集到一个池中,然后应用 Model soups 技术生成不同的融合模型:

  • uniform_merged
  • greedy_on_random
  • greedy_on_sorted

随后,我们应用 WiSE-FT 融合技术,从上述 3 个模型和原始模型组成的组中选择模型对进行融合,生成 wise_ft_merged 模型。至此,第一阶段微调结束。

然后在第二阶段微调中重复该过程,以 wise_ft_merged 作为新的起点,最终生成最终模型。整个微调流程如下图所示:

Finetuning flow

微调模型池和融合模型的性能在 CrossNER、TwitterNER 基准上进行了评估,并在以下两个图中进行了绘制(分别为 crossner_f1 和 other_f1)。

第一阶段微调图: 1st finetuning phase

第二阶段微调图: 2nd finetuning phase

🛠️ 安装

要使用此模型,您必须安装 GLiNER Python 库:

pip install gliner

下载 GLiNER 库后,您可以导入 GLiNER 类。然后,您可以使用 GLiNER.from_pretrained 加载此模型。

💻 用法

from gliner import GLiNER

model = GLiNER.from_pretrained("xomad/gliner-model-merge-large-v1.0")

text = """
Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975 to develop and sell BASIC interpreters for the Altair 8800. During his career at Microsoft, Gates held the positions of chairman, chief executive officer, president and chief software architect, while also being the largest individual shareholder until May 2014.
"""

labels = ["founder", "computer", "software", "position", "date", "company"]

entities = model.predict_entities(text, labels)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

输出:

Microsoft => company
Bill Gates => founder
Paul Allen => founder
April 4, 1975 => date
BASIC => software
Altair 8800 => computer
Microsoft => company
chairman => position
chief executive officer => position
president => position
chief software architect => position
May 2014 => date

📊 基准测试:

模型性能

在不同零样本命名实体识别(NER)基准测试(CrossNER、mit-movie 和 mit-restaurant)上的性能,数据来源于 https://huggingface.co/knowledgator/gliner-multitask-large-v0.5:

模型F1 分数
xomad/gliner-model-merge-large-v1.00.6601
knowledgator/gliner-multitask-v0.50.6276
numind/NuNER_Zero-span0.6196
gliner-community/gliner_large-v2.50.615
EmergentMethods/gliner_large_news-v2.10.5876
urchade/gliner_large-v2.10.5754

不同数据集上的详细性能:

模型数据集精确率召回率F1 分数F1 分数(小数)
xomad/gliner-model-merge-large-v1.0CrossNER_AI62.66%57.48%59.96%0.5996
CrossNER_literature73.28%66.42%69.68%0.6968
CrossNER_music74.89%70.67%72.72%0.7272
CrossNER_politics79.46%77.57%78.51%0.7851
CrossNER_science74.72%70.24%72.41%0.7241
mit-movie67.33%57.89%62.25%0.6225
mit-restaurant54.94%40.41%46.57%0.4657
平均值0.6601
numind/NuNER_Zero-spanCrossNER_AI63.82%56.82%60.12%0.6012
CrossNER_literature73.53%58.06%64.89%0.6489
CrossNER_music72.69%67.40%69.95%0.6995
CrossNER_politics77.28%68.69%72.73%0.7273
CrossNER_science70.08%63.12%66.42%0.6642
mit-movie63.00%48.88%55.05%0.5505
mit-restaurant54.81%37.62%44.62%0.4462
平均值0.6196
knowledgator/gliner-multitask-v0.5CrossNER_AI51.00%51.11%51.05%0.5105
CrossNER_literature72.65%65.62%68.96%0.6896
CrossNER_music74.91%73.70%74.30%0.7430
CrossNER_politics78.84%77.71%78.27%0.7827
CrossNER_science69.20%65.48%67.29%0.6729
mit-movie61.29%52.59%56.60%0.5660
mit-restaurant50.65%38.13%43.51%0.4351
平均值0.6276
gliner-community/gliner_large-v2.5CrossNER_AI50.85%63.03%56.29%0.5629
CrossNER_literature64.92%67.21%66.04%0.6604
CrossNER_music70.88%73.10%71.97%0.7197
CrossNER_politics72.67%72.93%72.80%0.7280
CrossNER_science61.71%68.85%65.08%0.6508
mit-movie54.63%52.83%53.71%0.5371
mit-restaurant47.99%42.13%44.87%0.4487
平均值0.6154
urchade/gliner_large-v2.1CrossNER_AI54.98%52.00%53.45%0.5345
CrossNER_literature59.33%56.47%57.87%0.5787
CrossNER_music67.39%66.77%67.08%0.6708
CrossNER_politics66.07%63.76%64.90%0.6490
CrossNER_science61.45%62.56%62.00%0.6200
mit-movie55.94%47.36%51.29%0.5129
mit-restaurant53.34%40.83%46.25%0.4625
平均值0.5754
EmergentMethods/gliner_large_news-v2.1CrossNER_AI59.60%54.55%56.96%0.5696
CrossNER_literature65.41%56.16%60.44%0.6044
CrossNER_music67.47%63.08%65.20%0.6520
CrossNER_politics66.05%60.07%62.92%0.6292
CrossNER_science68.44%63.57%65.92%0.6592
mit-movie65.85%49.59%56.57%0.5657
mit-restaurant54.71%35.94%43.38%0.4338
平均值0.5876

作者

Hoan Nguyen,来自 xomad.com

引用

@misc{wortsman2022modelsoupsaveragingweights,
      title={Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time}, 
      author={Mitchell Wortsman and Gabriel Ilharco and Samir Yitzhak Gadre and Rebecca Roelofs and Raphael Gontijo-Lopes and Ari S. Morcos and Hongseok Namkoong and Ali Farhadi and Yair Carmon and Simon Kornblith and Ludwig Schmidt},
      year={2022},
      eprint={2203.05482},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2203.05482}, 
}

@InProceedings{Wortsman_2022_CVPR,
    author    = {Wortsman, Mitchell and Ilharco, Gabriel and Kim, Jong Wook and Li, Mike and Kornblith, Simon and Roelofs, Rebecca and Lopes, Raphael Gontijo and Hajishirzi, Hannaneh and Farhadi, Ali and Namkoong, Hongseok and Schmidt, Ludwig},
    title     = {Robust Fine-Tuning of Zero-Shot Models},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {7959-7971}
}

@misc{stepanov2024gliner,
      title={GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks}, 
      author={Ihor Stepanov and Mykhailo Shtopko},
      year={2024},
      eprint={2406.12925},
      archivePrefix={arXiv},
      primaryClass={id='cs.LG' full_name='Machine Learning' is_active=True alt_name=None in_archive='cs' is_general=False description='Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.'}
}

@misc{zaratiana2023gliner,
      title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer}, 
      author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
      year={2023},
      eprint={2311.08526},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}