如何使用

from openmind import omdatasets, pipeline, is_torch_npu_available, AutoTokenizer
import argparse
import time
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--model_name_or_path",
        type=str,
        help="Path to model",
        default="zhouhui/distilroberta-finetuned-financial-text-classification",
    )
    args = parser.parse_args()
    return args

args = parse_args()
model_path = args.model_name_or_path

if is_torch_npu_available():
    device = "npu:0"
else:
    device = "cpu"
#device = "cpu"
start_time = time.time()
unmasker = pipeline('text-classification', model=model_path,device=device)
print(unmasker("The man worked as a <mask>."))
end_time = time.time()
print(f"硬件环境：{device},推理执行时间：{end_time - start_time}秒")

distilroberta-finetuned-financial-text-classification

该模型是 distilroberta-base 在 sentence_50Agree financial-phrasebank + Kaggle Dataset 上的微调版本。该数据集包含 4840 条财经新闻，按情感（负面、中性、正面）分类。Kaggle 数据集包含 Covid-19 情感数据，可在此处找到：sentiment-classification-selflabel-dataset。

它在评估集上取得了以下结果：

损失：0.4463
F1：0.8835

模型描述

模型用于判断给定文本的金融情感。鉴于类别标签分布不平衡，我们对权重进行了调整，以关注样本较少的标签，这有助于提升整体性能。加入 Covid 数据集是为了丰富模型，因为大多数模型都未针对 Covid-19 对收益或市场的影响进行训练。

训练超参数

训练过程中使用了以下超参数：

learning_rate：2e-05
train_batch_size：64
eval_batch_size：64
seed：42
optimizer：Adam，betas=(0.9, 0.999)，epsilon=1e-08
lr_scheduler_type：linear
num_epochs：10
mixed_precision_training：Native AMP

训练结果

训练损失	轮次	步数	验证损失	F1
0.7309	1.0	72	0.3671	0.8441
0.3757	2.0	144	0.3199	0.8709
0.3054	3.0	216	0.3096	0.8678
0.2229	4.0	288	0.3776	0.8390
0.1744	5.0	360	0.3678	0.8723
0.1436	6.0	432	0.3728	0.8758
0.1044	7.0	504	0.4116	0.8744
0.0931	8.0	576	0.4148	0.8761
0.0683	9.0	648	0.4423	0.8837
0.0611	10.0	720	0.4463	0.8835

框架版本

Transformers 4.15.0
Pytorch 1.10.0+cu111
Datasets 1.18.0
Tokenizers 0.10.3

如何使用

from openmind import omdatasets, pipeline, is_torch_npu_available, AutoTokenizer
import argparse
import time
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--model_name_or_path",
        type=str,
        help="Path to model",
        default="zhouhui/distilroberta-finetuned-financial-text-classification",
    )
    args = parser.parse_args()
    return args

args = parse_args()
model_path = args.model_name_or_path

if is_torch_npu_available():
    device = "npu:0"
else:
    device = "cpu"
#device = "cpu"
start_time = time.time()
unmasker = pipeline('text-classification', model=model_path,device=device)
print(unmasker("The man worked as a <mask>."))
end_time = time.time()
print(f"硬件环境：{device},推理执行时间：{end_time - start_time}秒")

distilroberta-finetuned-financial-text-classification

它在评估集上取得了以下结果：

损失：0.4463

F1：0.8835

模型描述

训练超参数

训练过程中使用了以下超参数：

learning_rate：2e-05

train_batch_size：64

eval_batch_size：64

seed：42

optimizer：Adam，betas=(0.9, 0.999)，epsilon=1e-08

lr_scheduler_type：linear

num_epochs：10

mixed_precision_training：Native AMP

训练结果

训练损失	轮次	步数	验证损失	F1
0.7309	1.0	72	0.3671	0.8441
0.3757	2.0	144	0.3199	0.8709
0.3054	3.0	216	0.3096	0.8678
0.2229	4.0	288	0.3776	0.8390
0.1744	5.0	360	0.3678	0.8723
0.1436	6.0	432	0.3728	0.8758
0.1044	7.0	504	0.4116	0.8744
0.0931	8.0	576	0.4148	0.8761
0.0683	9.0	648	0.4423	0.8837
0.0611	10.0	720	0.4463	0.8835

框架版本

Transformers 4.15.0

Pytorch 1.10.0+cu111

Datasets 1.18.0

Tokenizers 0.10.3