（评估进行中）

Hermes + Leo + German Laser = Germeo

Germeo-7B-Laser

一个基于 Hermeo-7B 合并而来的德英双语理解、但仅使用德语输出的模型。

模型详情

合并来源：leo-mistral-hessianai-7b-chat 和 DPOpenHermes-7B-v2

模型类型：因果解码器式 openmind 语言模型

语言：德语回复，具备英语理解能力

Laser-Data：LeoLM/OpenSchnabeltier

这是一项关于 laser 及其对语言理解影响的早期实验。它通常能提升语言理解能力。假设是，它会降低英语回复的概率，同时增加德语回复的概率。模型内部的德语能力得到增强。

将持续更新……

提示词格式：

streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
# Convert prompt to tokens
prompt_template = """<|im_start|>system
Du bist ein hilfreicher Assistent.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant"""

prompt = "Schreibe eine Stellenanzeige für Data Scientist bei AXA!"

final_prompt = prompt_template.format(prompt=prompt)

代码使用：

以下是构建此模型的核心部分：

import argparse
import torch
from openmind import pipeline, is_torch_npu_available, AutoModelForCausalLM, AutoTokenizer

# Check if NPU or CPU is available
if is_torch_npu_available():
  device = "npu:0"
  print("Using NPU for inference")
else:
  device = "cpu"
  print("Using CPU for inference")

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("SY_AICC/germeo-7b-laser", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Define the pipeline with the correct task
generate_text = pipeline(
  task="text-generation",  # Specify the task
  model=model,
  tokenizer=tokenizer,
  torch_dtype=torch.bfloat16 if device != "cpu" else torch.float32,  # Use bfloat16 if not on CPU
  device=device
)

德语基准测试

德语任务：	MMLU-DE	Hellaswag-DE	ARC-DE	平均值
模型 / 少样本：	(5 样本)	(10 样本)	(24 样本)
70亿参数
llama-2-7b	0.400	0.513	0.381	0.431
leo-hessianai-7b	0.400	0.609	0.429	0.479
bloom-6b4-clp-german	0.274	0.550	0.351	0.392
mistral-7b	0.524	0.588	0.473	0.528
leo-mistral-hessianai-7b	0.481	0.663	0.485	0.543
leo-mistral-hessianai-7b-chat	0.458	0.617	0.465	0.513
DPOpenHermes-7B-v2	0.517	0.603	0.515	0.545
hermeo-7b	0.511	0.668	0.528	0.569
germeo-7b-laser (本模型)	?	?	?	?
130亿参数
llama-2-13b	0.469	0.581	0.468	0.506
leo-hessianai-13b	0.486	0.658	0.509	0.551
700亿参数
llama-2-70b	0.597	0.674	0.561	0.611
leo-hessianai-70b	0.653	0.721	0.600	0.658

尽管该模型在未被明确要求的情况下不会生成英文文本，但其在英文基准测试中的性能仍然有所提升：

英语基准测试

英语任务：	MMLU	Hellaswag	ARC	平均值
模型 / 少样本：	(5 样本)	(10 样本)	(24 样本)
llama-2-7b	0.466	0.786	0.530	0.594
leolm-hessianai-7b	0.423	0.759	0.522	0.568
bloom-6b4-clp-german	0.264	0.525	0.328	0.372
mistral-7b	0.635	0.832	0.607	0.691
leolm-mistral-hessianai-7b	0.550	0.777	0.518	0.615
hermeo-7b	0.601	0.821	0.620	0.681
germeo-7b-laser (本模型)	0.601	0.828	0.608	0.679

Open LLM 排行榜评估结果

详细结果可查看此处

指标	数值
平均值	62.82
AI2 推理挑战（25次提示）	60.75
HellaSwag（10次提示）	82.81
MMLU（5次提示）	60.57
TruthfulQA（零次提示）	53.83
Winogrande（5次提示）	75.61
GSM8k（5次提示）	43.37