目前已覆盖英语、中文、法语、印地语、西班牙语、印地语、阿拉伯语
👨🏻💻GitHub •📃 论文 • 🌐 演示 • 🤗 ApolloCorpus • 🤗 XMedBench
中文 | English


User:{query}\nAssistant:{response}<|endoftext|>
数据集 🤗 ApolloCorpus

[
"string1",
"string2",
...
][
[
"q1",
"a1",
"q2",
"a2",
...
],
...
] [
[
"q1",
"a1",
"q2",
"a2",
...
],
...
]评估 🤗 XMedBench
英语(EN):
中文(ZH):
西班牙语(ES):Head_qa
法语(FR):Frenchmedmcqa
印地语(HI):MMLU_HI
阿拉伯语(AR):MMLU_Ara
from openmind import AutoTokenizer, AutoModelForCausalLM import openmind import torch import torch_npu import argparse
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
"--model_name_or_path",
type=str,
help="模型路径",
default="LF_AICC/Apollo-6B",
)
args = parser.parse_args()
return args
args = parse_args()
model = args.model_name_or_path
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = openmind.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
device_map="auto",
)
sequences = pipeline(
"<|im_start|>user\nDoes P=NP?<|im_end|>\n<|im_start|>assistant\n",
max_length=256,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")如果您打算使用我们的数据集进行训练或评估,请使用以下引用:
@misc{wang2024apollo,
title={Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People},
author={Xidong Wang and Nuo Chen and Junyin Chen and Yan Hu and Yidong Wang and Xiangbo Wu and Anningzhe Gao and Xiang Wan and Haizhou Li and Benyou Wang},
year={2024},
eprint={2403.03640},
archivePrefix={arXiv},
primaryClass={cs.CL}
}