使用 Bert 进行德语情感分类

该模型专为德语文本的情感分类而训练。为获得最佳结果，所有模型输入都需要采用与训练期间相同的流程进行预处理。为简化模型的使用，我们提供了一个 Python 包，其中捆绑了预处理和推理所需的代码。

该模型采用 Google Bert 架构，并在 183.4 万条德语样本上进行了训练。训练数据包含来自不同领域的文本，如 Twitter、Facebook 以及电影、应用程序和酒店的评论。你可以在论文中找到有关数据集和训练过程的更多信息。

使用 Python 包

首先，从 pypi 安装该包：

pip install germansentiment

from germansentiment import SentimentModel

model = SentimentModel()

texts = [
    "Mit keinem guten Ergebniss","Das ist gar nicht mal so gut",
    "Total awesome!","nicht so schlecht wie erwartet",
    "Der Test verlief positiv.","Sie fährt ein grünes Auto."]
       
result = model.predict_sentiment(texts)
print(result)

上述代码将输出以下列表：

["negative","negative","positive","positive","neutral", "neutral"]

输出类别概率

from germansentiment import SentimentModel

model = SentimentModel()

classes, probabilities = model.predict_sentiment(["das ist super"], output_probabilities = True) 
print(classes, probabilities)

['positive'] [[['positive', 0.9761366844177246], ['negative', 0.023540444672107697], ['neutral', 0.00032294404809363186]]]

模型与数据

如果您对训练此模型所使用的代码和数据感兴趣，请查看此仓库以及我们的论文。以下是该模型在不同数据集上的F1分数表。由于我们使用更新版本的transformer库训练了此模型，因此结果略优于论文中报告的结果。

数据集	F1微观分数
holidaycheck	0.9568
scare	0.9418
filmstarts	0.9021
germeval	0.7536
PotTS	0.6780
emotions	0.9649
sb10k	0.7376
Leipzig Wikipedia Corpus 2016	0.9967
all	0.9639

引用

如需反馈和问题，请通过邮件与我联系。如果您觉得此模型有帮助，请引用我们：

@InProceedings{guhr-EtAl:2020:LREC,
  author    = {Guhr, Oliver  and  Schumann, Anne-Kathrin  and  Bahrmann, Frank  and  Böhme, Hans Joachim},
  title     = {Training a Broad-Coverage German Sentiment Classification Model for Dialog Systems},
  booktitle      = {Proceedings of The 12th Language Resources and Evaluation Conference},
  month          = {May},
  year           = {2020},
  address        = {Marseille, France},
  publisher      = {European Language Resources Association},
  pages     = {1620--1625},
  url       = {https://www.aclweb.org/anthology/2020.lrec-1.202}
}