此模型是使用基于Wikipedia的日语固有表現抽出数据集(由Stockmark公司提供,https://github.com/stockmarkteam/ner-wikipedia-dataset)对luke-japanese-base进行微调得到的。
可用于固有表現抽出(NER)任务。
This model is fine-tuned by using Wikipedia dataset.
You could use this model for NER tasks.
| precision | recall | f1-score | support | |
|---|---|---|---|---|
| 其他组织名 | 0.76 | 0.77 | 0.77 | 238 |
| 事件名 | 0.83 | 0.90 | 0.87 | 215 |
| 人名 | 0.88 | 0.91 | 0.90 | 546 |
| 地名 | 0.84 | 0.83 | 0.83 | 440 |
| 政治组织名 | 0.80 | 0.84 | 0.82 | 263 |
| 设施名 | 0.78 | 0.83 | 0.80 | 241 |
| 法人名 | 0.88 | 0.90 | 0.89 | 487 |
| 产品名 | 0.74 | 0.80 | 0.77 | 252 |
| micro avg | 0.83 | 0.86 | 0.84 | 2682 |
| macro avg | 0.81 | 0.85 | 0.83 | 2682 |
| weighted avg | 0.83 | 0.86 | 0.84 | 2682 |
from openmind import pipeline, AutoTokenizer, is_torch_npu_available
from transformers import AutoModelForTokenClassification
from openmind_hub import snapshot_download
import torch.nn.functional as F
from torch import Tensor
import openmind
import torch
import argparse
import time
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
"--model_name_or_path",
type=str,
help="Path to model",
default="models/luke-japanese-base-finetuned-ner",
)
args = parser.parse_args()
return args
def main():
args = parse_args()
model_path = args.model_name_or_path
if is_torch_npu_available():
device = "npu:0"
else:
device = "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForTokenClassification.from_pretrained(model_path).to(device)
start_time = time.time()
pipe = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple",device_map=device)
result = pipe('昨日は東京で買い物をした')
print(result)
end_time = time.time()
print(f"硬件环境:{device},推理执行时间:{end_time - start_time}秒")
if __name__ == "__main__":
main()安装 sentencepiece 和 transformers(pip install sentencepiece,pip install transformers) 执行以下代码,即可解决 NER 任务。 请执行此代码。
from transformers import MLukeTokenizer,pipeline, LukeForTokenClassification
tokenizer = MLukeTokenizer.from_pretrained('Mizuiro-sakura/luke-japanese-base-finetuned-ner')
model=LukeForTokenClassification.from_pretrained('Mizuiro-sakura/luke-japanese-base-finetuned-ner') # 学習済みモデルの読み込み
text=('昨日は東京で買い物をした')
ner=pipeline('ner', model=model, tokenizer=tokenizer)
result=ner(text)
print(result)LUKE(基于知识嵌入的语言理解,Language Understanding with Knowledge-based Embeddings)是一种全新的基于Transformer的单词和实体预训练上下文表示模型。LUKE将给定文本中的单词和实体视为独立的标记,并输出它们的上下文表示。LUKE采用了实体感知自注意力机制,这是对Transformer自注意力机制的扩展,在计算注意力分数时会考虑标记的类型(单词或实体)。
LUKE在五个主流NLP基准测试中均取得了最先进的结果,包括SQuAD v1.1(抽取式问答)、CoNLL-2003(命名实体识别)、ReCoRD(完形填空式问答)、TACRED(关系分类)和Open Entity(实体类型判定)。luke-japanese是单词和实体的知识扩展型训练Transformer模型LUKE的日语版本。LUKE将单词和实体作为独立的标记进行处理,并输出考虑了这些上下文的表示。
感谢Luke的开发者山田先生和Studio ousia。I would like to thank Mr.Yamada @ikuyamada and Studio ousia @StudioOusia.
[1]@inproceedings{yamada2020luke, title={LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention}, author={Ikuya Yamada and Akari Asai and Hiroyuki Shindo and Hideaki Takeda and Yuji Matsumoto}, booktitle={EMNLP}, year={2020} }