HuggingFace镜像/luke-japanese-base-finetuned-ner-openmind
模型介绍文件和版本分析
下载使用量0

该模型是通过对luke-japanese-base进行微调,使其可用于固有表現抽出(NER)的模型。

此模型是使用基于Wikipedia的日语固有表現抽出数据集(由Stockmark公司提供,https://github.com/stockmarkteam/ner-wikipedia-dataset)对luke-japanese-base进行微调得到的。

可用于固有表現抽出(NER)任务。

This model is fine-tuned model for Named-Entity-Recognition(NER) which is based on luke-japanese-base

This model is fine-tuned by using Wikipedia dataset.

You could use this model for NER tasks.

模型精度 accuracy of model

precisionrecallf1-scoresupport
其他组织名0.760.770.77238
事件名  0.830.900.87215
人名  0.880.910.90546
地名0.840.830.83440
政治组织名0.800.840.82263
设施名0.780.830.80241
法人名0.880.900.89487
产品名0.740.800.77252
micro avg0.830.860.842682
macro avg0.810.850.832682
weighted avg0.830.860.842682

Use in Openmind

from openmind import pipeline, AutoTokenizer, is_torch_npu_available
from transformers import AutoModelForTokenClassification
from openmind_hub import snapshot_download
import torch.nn.functional as F
from torch import Tensor
import openmind
import torch
import argparse
import time

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--model_name_or_path",
        type=str,
        help="Path to model",
        default="models/luke-japanese-base-finetuned-ner",
    )
    args = parser.parse_args()
    return args

def main():
    args = parse_args()
    model_path = args.model_name_or_path

    if is_torch_npu_available():
        device = "npu:0"
    else:
        device = "cpu"
    
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForTokenClassification.from_pretrained(model_path).to(device)
    
    start_time = time.time()
    
    pipe = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple",device_map=device)
    result = pipe('昨日は東京で買い物をした')

    print(result)
    
    end_time = time.time()
    print(f"硬件环境:{device},推理执行时间:{end_time - start_time}秒")
    
if __name__ == "__main__":
    main()

How to use 使用方法

安装 sentencepiece 和 transformers(pip install sentencepiece,pip install transformers) 执行以下代码,即可解决 NER 任务。 请执行此代码。

from transformers import MLukeTokenizer,pipeline, LukeForTokenClassification

tokenizer = MLukeTokenizer.from_pretrained('Mizuiro-sakura/luke-japanese-base-finetuned-ner')
model=LukeForTokenClassification.from_pretrained('Mizuiro-sakura/luke-japanese-base-finetuned-ner') # 学習済みモデルの読み込み

text=('昨日は東京で買い物をした')

ner=pipeline('ner', model=model, tokenizer=tokenizer)

result=ner(text)
print(result)

什么是 Luke?Lukeとは?[1]

LUKE(基于知识嵌入的语言理解,Language Understanding with Knowledge-based Embeddings)是一种全新的基于Transformer的单词和实体预训练上下文表示模型。LUKE将给定文本中的单词和实体视为独立的标记,并输出它们的上下文表示。LUKE采用了实体感知自注意力机制,这是对Transformer自注意力机制的扩展,在计算注意力分数时会考虑标记的类型(单词或实体)。

LUKE在五个主流NLP基准测试中均取得了最先进的结果,包括SQuAD v1.1(抽取式问答)、CoNLL-2003(命名实体识别)、ReCoRD(完形填空式问答)、TACRED(关系分类)和Open Entity(实体类型判定)。luke-japanese是单词和实体的知识扩展型训练Transformer模型LUKE的日语版本。LUKE将单词和实体作为独立的标记进行处理,并输出考虑了这些上下文的表示。

致谢 謝辞

感谢Luke的开发者山田先生和Studio ousia。I would like to thank Mr.Yamada @ikuyamada and Studio ousia @StudioOusia.

引用

[1]@inproceedings{yamada2020luke, title={LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention}, author={Ikuya Yamada and Akari Asai and Hiroyuki Shindo and Hideaki Takeda and Yuji Matsumoto}, booktitle={EMNLP}, year={2020} }