HuggingFace镜像/employment-contract-ner-da
模型介绍文件和版本分析
下载使用量0

contract-ner-model-da

该模型是在自定义合同数据集上对 xlm-roberta-base 进行微调得到的版本。 其在评估集上取得了以下结果:

  • 损失值:0.0026
  • 微平均 F1 值:0.9297

训练过程

训练超参数

训练过程中使用了以下超参数:

  • 学习率:2e-05
  • 训练批次大小:8
  • 评估批次大小:8
  • 随机种子:42
  • 梯度累积步数:4
  • 总训练批次大小:32
  • 优化器:Adam,参数 betas=(0.9,0.999),epsilon=1e-08
  • 学习率调度器类型:线性
  • 学习率调度器预热步数:919
  • 训练轮数:500

使用方法

from openmind import AutoModelForSequenceClassification,AutoTokenizer, AutoModel, is_torch_npu_available
from openmind_hub import snapshot_download
import torch
import argparse
import torch.nn.functional as F
import time

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--model_name_or_path",
        type=str,
        help="Path to model",
        default="zhouhui/employment-contract-ner-da",
    )
    args = parser.parse_args()
    return args

def main():
    args = parse_args()
    model_path = args.model_name_or_path

    if is_torch_npu_available():
        device = "npu:0"
    else:
        device = "cpu"
    #device = "cpu"
    start_time = time.time()     
    tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
    model = AutoModelForSequenceClassification.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)

    premise = "I first thought that I liked the movie, but upon second thought it was actually disappointing."
    hypothesis = "The movie was good."

    input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt")
    output = model(input["input_ids"].to(device))  # device = "cuda:0" or "cpu"
    prediction = torch.softmax(output["logits"][0], -1).tolist()
    label_names = ["entailment", "neutral", "contradiction"]
    prediction = {name: round(float(pred) * 100, 1) for pred, name in zip(prediction, label_names)}
    print(prediction)
    end_time = time.time()
    print(f"硬件环境:{device},推理执行时间:{end_time - start_time}秒")


if __name__ == "__main__":
    main()

训练结果

训练损失轮次步数验证损失微平均F1值
0.89710.242000.02050.0
0.01730.484000.01000.2921
0.00920.736000.00650.7147
0.00630.978000.00460.8332
0.00471.2110000.00470.8459
0.00421.4512000.00390.8694
0.00371.6914000.00350.8888
0.00321.9316000.00350.8840
0.00252.1818000.00290.8943
0.00232.4220000.00240.9104
0.00232.6622000.00320.8808
0.00212.924000.00220.9338
0.00183.1426000.00200.9315
0.00153.3928000.00260.9297

框架版本

  • Transformers 4.11.3
  • Pytorch 1.8.1+cu101
  • Datasets 1.12.1
  • Tokenizers 0.10.3