HuggingFace镜像/CBDDO-LLM-8B-Instruct-v1
模型介绍文件和版本分析
下载使用量0

基于LLama3的土耳其语语言模型:aerdincdal/CBDDO-LLM-8B-Instruct-v1

aerdincdal/CBDDO-LLM-8B-Instruct-v1是一款基于LLama3架构构建,并通过指令微调(Instruction Tune)方法,利用包含250万行数据的数据集进行定制训练的土耳其语语言模型。该模型能够在自然语言处理领域高效完成各类任务。模型的训练使其深入理解土耳其语的语法和句法规则,从而能够生成流畅且准确的文本。

模型的突出特点:

  • 先进的LLama3架构:该架构为自然语言处理模型奠定了极为高效且创新的基础。
  • 基于广泛数据集的训练:模型使用包含250万行数据的数据集进行训练,这确保了它能够出色地掌握语言结构和细微差别。
  • 高性能:模型能够快速且高效地完成复杂的语言处理任务。
  • 多功能性:在文本生成、翻译、问答、摘要以及代码编写等多种任务中均表现出色。

模型的使用步骤:

  1. 安装所需库:

    pip install transformers
  2. 测试模型:

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer, pipeline
import torch

model_id = "aerdincdal/CBDDO-LLM-8B-Instruct-v1"
device = "npu:0"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,  
    device_map=device,           
    trust_remote_code=True       
)

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True       
)

streamer = TextStreamer(tokenizer)

text_generation_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    model_kwargs={"torch_dtype": torch.bfloat16},  
    streamer=streamer
)

messages = [
    {"role": "system", "content": "Her zaman düşünceli yanıtlar veren bir chatbot'sun."},
    {"role": "user", "content": "Mona Lisa tablosu hakkında ne düşünüyorsun?"}
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

terminators = [
    tokenizer.eos_token_id
]

outputs = text_generation_pipeline(
    prompt,
    max_new_tokens=2048,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.95
)

print(outputs[0]["generated_text"][len(prompt):])

输出:

1503'te Leonardo da Vinci tarafından resmedilen Mona Lisa, 16. yüzyılda Avrupa'da resim sanatının en ünlü eserlerinden biridir. Eski bir İtalyan aristokratı olan Lisa del Giocondo'ya benzeyen bir kadın portresidir. Bu tablo, Leonardo da Vinci'nin en ünlü eserlerinden biri olarak kabul edilir ve sanatın en iyi örneklerinden biri olarak kabul edilir. Mona Lisa'nın önemi, resim sanatının gelişiminde ve sanat tarihi boyunca etkisinin büyüklüğüne dayanmaktadır.

模型的多种应用领域:

  • 文本生成:您可以生成各种类型和语气的文本。
  • 文本翻译:凭借多语言翻译能力,您可以将文本翻译成其他语言或进行口译。
  • 问答:可以回答各种问题,甚至是最具挑战性的问题。
  • 摘要:可以将长文本简短扼要地进行总结。
  • 代码编写:可以根据给定的需求生成相应的代码。

代码编写示例:

在此示例中,模型正在编写一个将文本转换为大写的Python函数:

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer, pipeline
import torch

model_id = "aerdincdal/CBDDO-LLM-8B-Instruct-v1"
device = "npu:0"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,  
    device_map=device,           
    trust_remote_code=True       
)

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True       
)

streamer = TextStreamer(tokenizer)

text_generation_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    model_kwargs={"torch_dtype": torch.bfloat16},  
    streamer=streamer
)

messages = [
    {"role": "system", "content": "Her zaman düşünceli yanıtlar veren bir chatbot'sun."},
    {"role": "user", "content": "Python ile bir metni büyük harfe çeviren bir fonksiyon yaz."}
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

terminators = [
    tokenizer.eos_token_id
]

outputs = text_generation_pipeline(
    prompt,
    max_new_tokens=2048,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.95
)

print(outputs[0]["generated_text"][len(prompt):])

输出:

def metni_buyuk_harfe_cevir(metin):
    """Bir metni tümüyle büyük harfe çeviren Python fonksiyonu.

    Args:
        metin: Küçük harflerle yazılmış bir metin.

    Returns:
        Büyük harflerle yazılmış metin.
    """
    return metin.upper()

# Örnek kullanım
metin = "Bu bir deneme metnidir."
buyuk_harf_metin = metni_buyuk_harfe_cevir(metin)
print(buyuk_harf_metin)

说明: 模型通过处理给定的指令(“编写一个用 Python 将文本转换为大写的函数。”),生成包含说明和文档的完整 Python 代码。该函数可以将任何小写文本转换为大写,从而便于对文本进行操作。

通过这些简单步骤,您可以挑战土耳其语自然语言处理能力的极限,并探索我们的语言模型如何为您提供帮助。与我们一起踏上这一技术之旅,拓展您的语言处理能力!

BENCHMARK:

"config_general": {
    "lighteval_sha": "494ee12240e716e804ae9ea834f84a2c864c07ca",
    "num_few_shot_default": 0,
    "num_fewshot_seeds": 1,
    "override_batch_size": 1,
    "max_samples": null,
    "job_id": "",
    "start_time": 1781075.607155059,
    "end_time": 1784655.466140587,
    "total_evaluation_time_secondes": "3579.858985528117",
    "model_name": "aerdincdal/CBDDO-LLM-8B-Instruct-v1",
    "model_sha": "84430552036c85cc6a16722b26496df4d93f3afe",
    "model_dtype": "torch.bfloat16",
    "model_size": "15.08 GB"
  },
  "results": {
    "harness|arc:challenge|25": {
      "acc": 0.4991467576791809,
      "acc_stderr": 0.014611369529813262,
      "acc_norm": 0.5460750853242321,
      "acc_norm_stderr": 0.014549221105171872
    },
    "harness|hellaswag|10": {
      "acc": 0.5552678749253137,
      "acc_stderr": 0.004959204773046207,
      "acc_norm": 0.7633937462656841,
      "acc_norm_stderr": 0.004241299341050841
    },
    "harness|hendrycksTest-abstract_algebra|5": {
      "acc": 0.35,
      "acc_stderr": 0.047937248544110196,
      "acc_norm": 0.35,
      "acc_norm_stderr": 0.047937248544110196
    },
    "harness|hendrycksTest-anatomy|5": {
      "acc": 0.6148148148148148,
      "acc_stderr": 0.04203921040156279,
      "acc_norm": 0.6148148148148148,
      "acc_norm_stderr": 0.04203921040156279
    },
    "harness|hendrycksTest-astronomy|5": {
      "acc": 0.5986842105263158,
      "acc_stderr": 0.039889037033362836,
      "acc_norm": 0.5986842105263158,
      "acc_norm_stderr": 0.039889037033362836
    },
    "harness|hendrycksTest-business_ethics|5": {
      "acc": 0.62,
      "acc_stderr": 0.048783173121456316,
      "acc_norm": 0.62,
      "acc_norm_stderr": 0.048783173121456316
    },
    "harness|hendrycksTest-clinical_knowledge|5": {
      "acc": 0.7094339622641509,
      "acc_stderr": 0.02794321998933714,
      "acc_norm": 0.7094339622641509,
      "acc_norm_stderr": 0.02794321998933714
    }