FinBERT 是一款基于金融沟通文本预训练的 BERT 模型。其旨在推动金融 NLP 的研究与实践。该模型的训练基于以下三个金融沟通语料库,总语料规模达 49 亿个 tokens。
有关 FinBERT 的更多技术细节:点击链接
此次发布的 finbert-tone 模型,是在 FinBERT 模型基础上,使用 10,000 条来自分析师报告、经人工标注(积极、消极、中性)的句子进行微调后得到的。该模型在金融语气分析任务上表现卓越。如果您仅希望使用 FinBERT 进行金融语气分析,不妨一试。
若您在学术工作中使用此模型,请引用以下论文:
Huang, Allen H., Hui Wang, and Yi Yang. "FinBERT: A Large Language Model for Extracting Information from Financial Text." Contemporary Accounting Research (2022).
您可以将此模型与 Transformers pipeline 结合,用于情感分析。
from transformers import BertTokenizer, BertForSequenceClassification
from transformers import pipeline
finbert = BertForSequenceClassification.from_pretrained('Beijing-Ascend/finbert-tone',num_labels=3)
tokenizer = BertTokenizer.from_pretrained('Beijing-Ascend/finbert-tone')
nlp = pipeline("sentiment-analysis", model=finbert, tokenizer=tokenizer)
sentences = ["there is a shortage of capital, and we need extra financing",
"growth is strong and we have plenty of liquidity",
"there are doubts about our finances",
"profits are flat"]
results = nlp(sentences)
print(results) #LABEL_0: neutral; LABEL_1: positive; LABEL_2: negative