bert-large-uncased-finetuned-ner

该模型是 [bert-large-uncased] 在 conll2003 数据集上的微调版本。它在评估集上取得了以下结果：

损失：0.0778
精确率：0.9505
召回率：0.9575
F1 值：0.9540
准确率：0.9886

模型描述

需要更多信息

局限性与偏差

此模型受限于其训练数据集，该数据集包含特定时间段内带有实体标注的新闻文章。对于不同领域的所有使用场景，其泛化能力可能不佳。此外，该模型偶尔会将子词标记为实体，可能需要对结果进行后处理以处理这些情况。

如何使用

您可以将此模型与 openmind 的 NER pipeline 配合使用。

from openmind import pipeline
from openmind  import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("Changchun_Ascend/bert-large-uncased-finetuned-ner")
model = AutoModelForTokenClassification.from_pretrained("Changchun_Ascend/bert-large-uncased-finetuned-ner")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "My name is Scott and I live in Ohio"
ner_results = nlp(example)
print(ner_results)

{'entity': 'B-PER', 'score': 0.99951184, 'index': 4, 'word': 'scott', 'start': 11, 'end': 16}, {'entity': 'B-LOC', 'score': 0.9999815, 'index': 9, 'word': 'ohio', 'start': 31, 'end': 35}

训练过程

训练超参数

训练过程中使用了以下超参数：

学习率（learning_rate）：2e-05
训练批次大小（train_batch_size）：16
评估批次大小（eval_batch_size）：64
随机种子（seed）：42
优化器（optimizer）：Adam，参数 betas=(0.9,0.999)，epsilon=1e-08
学习率调度器类型（lr_scheduler_type）：线性
训练轮数（num_epochs）：10

训练结果

训练损失	轮次	步数	验证损失	精确率	召回率	F1值	准确率
0.1997	1.0	878	0.0576	0.9316	0.9257	0.9286	0.9837
0.04	2.0	1756	0.0490	0.9400	0.9513	0.9456	0.9870
0.0199	3.0	2634	0.0557	0.9436	0.9540	0.9488	0.9879
0.0112	4.0	3512	0.0602	0.9443	0.9569	0.9506	0.9881
0.0068	5.0	4390	0.0631	0.9451	0.9589	0.9520	0.9882
0.0044	6.0	5268	0.0638	0.9510	0.9567	0.9538	0.9885
0.003	7.0	6146	0.0722	0.9495	0.9560	0.9527	0.9885
0.0016	8.0	7024	0.0762	0.9491	0.9595	0.9543	0.9887
0.0018	9.0	7902	0.0769	0.9496	0.9542	0.9519	0.9883
0.0009	10.0	8780	0.0778	0.9505	0.9575	0.9540	0.9886

模型描述

需要更多信息

局限性与偏差

如何使用

您可以将此模型与 openmind 的 NER pipeline 配合使用。

from openmind import pipeline
from openmind  import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("Changchun_Ascend/bert-large-uncased-finetuned-ner")
model = AutoModelForTokenClassification.from_pretrained("Changchun_Ascend/bert-large-uncased-finetuned-ner")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "My name is Scott and I live in Ohio"
ner_results = nlp(example)
print(ner_results)

{'entity': 'B-PER', 'score': 0.99951184, 'index': 4, 'word': 'scott', 'start': 11, 'end': 16}, {'entity': 'B-LOC', 'score': 0.9999815, 'index': 9, 'word': 'ohio', 'start': 31, 'end': 35}

训练过程

训练超参数

训练过程中使用了以下超参数：

学习率（learning_rate）：2e-05

训练批次大小（train_batch_size）：16

评估批次大小（eval_batch_size）：64

随机种子（seed）：42

优化器（optimizer）：Adam，参数 betas=(0.9,0.999)，epsilon=1e-08

学习率调度器类型（lr_scheduler_type）：线性

训练轮数（num_epochs）：10

训练结果

训练损失	轮次	步数	验证损失	精确率	召回率	F1值	准确率
0.1997	1.0	878	0.0576	0.9316	0.9257	0.9286	0.9837
0.04	2.0	1756	0.0490	0.9400	0.9513	0.9456	0.9870
0.0199	3.0	2634	0.0557	0.9436	0.9540	0.9488	0.9879
0.0112	4.0	3512	0.0602	0.9443	0.9569	0.9506	0.9881
0.0068	5.0	4390	0.0631	0.9451	0.9589	0.9520	0.9882
0.0044	6.0	5268	0.0638	0.9510	0.9567	0.9538	0.9885
0.003	7.0	6146	0.0722	0.9495	0.9560	0.9527	0.9885
0.0016	8.0	7024	0.0762	0.9491	0.9595	0.9543	0.9887
0.0018	9.0	7902	0.0769	0.9496	0.9542	0.9519	0.9883
0.0009	10.0	8780	0.0778	0.9505	0.9575	0.9540	0.9886