SmolLM2

image/png

模型概述

SmolLM2 是一系列紧凑型语言模型，提供三种参数规模：135M、360M 和 1.7B。它们能够解决多种任务，同时足够轻量，可在设备端运行。

SmolLM2 相比其前身 SmolLM1 有显著进步，尤其在指令遵循、知识掌握和推理能力方面。360M 模型在 4 万亿 tokens 上进行了训练，使用了多样化的数据集组合：FineWeb-Edu、DCLM、The Stack，以及我们策划的、即将发布的新过滤数据集。我们通过使用公开数据集和我们自己策划的数据集进行监督微调（SFT），开发了指令版本。随后，我们使用 UltraFeedback 应用了直接偏好优化（DPO）。

得益于 Argilla 开发的数据集，如 Synth-APIGen-v0.1，指令模型还支持文本改写、摘要生成和函数调用（针对 1.7B 版本）等任务。您可以在此处找到 SFT 数据集：https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk，微调代码位于 [ali] https://github.com/huggingface/alignment-handbook/tree/main/recipes/smollm2

如何使用

在 Openmind 中使用

from openmind import AutoTokenizer, AutoModelForCausalLM, is_torch_npu_available
from openmind_hub import snapshot_download
import torch
import openmind
import argparse
import time

def generate_text(prompt, model, tokenizer, device):
    text_generator = openmind.pipeline(
        "text-generation",
        model=model,
        torch_dtype=torch.float16,
        device_map=device,
        tokenizer=tokenizer,
    )

    formatted_prompt = f"Question: {prompt} Answer:"

    sequences = text_generator(
        formatted_prompt,
        do_sample=True,
        top_k=5,
        top_p=0.9,
        num_return_sequences=1,
        repetition_penalty=1.5,
        max_new_tokens=128,
    )

    for seq in sequences:
        print(f"Result: {seq['generated_text']}")

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--model_name_or_path",
        type=str,
        help="Path to model",
        default="jeffding/SmolLM2-360M-Instruct-openmind",
    )
    args = parser.parse_args()
    return args

def main():
    args = parse_args()
    model_path = args.model_name_or_path

    if is_torch_npu_available():
        device = "npu:0"
    else:
        device = "cpu"
    tokenizer = AutoTokenizer.from_pretrained(model_path,trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained(model_path,trust_remote_code=True)
    model = model.to(device)
    
    start_time = time.time()
    
    # infer
    messages = [{"role": "user", "content": "What is the capital of France."}]
    input_text=tokenizer.apply_chat_template(messages, tokenize=False)
    print(input_text)
    inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
    outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
    print(tokenizer.decode(outputs[0]))

    
    end_time = time.time()
    print(f"硬件环境：{device},推理执行时间：{end_time - start_time}秒")
    
if __name__ == "__main__":
    main()

变形金刚

pip install transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM2-360M-Instruct"

device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

messages = [{"role": "user", "content": "What is the capital of France."}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
print(input_text)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))

在 TRL 中进行对话

您也可以使用 TRL 命令行界面在终端中与模型进行对话：

pip install trl
trl chat --model_name_or_path HuggingFaceTB/SmolLM2-360M-Instruct --device cpu

评估

在本节中，我们报告SmolLM2的评估结果。除非另有说明，所有评估均为零样本，并且我们使用lighteval来运行这些评估。

基础预训练模型

指标	SmolLM2-360M	Qwen2.5-0.5B	SmolLM-360M
HellaSwag	54.5	51.2	51.8
ARC（平均）	53.0	45.4	50.1
PIQA	71.7	69.9	71.6
MMLU（完形填空）	35.8	33.7	34.4
CommonsenseQA	38.0	31.6	35.3
TriviaQA	16.9	4.3	9.1
Winogrande	52.5	54.1	52.8
OpenBookQA	37.4	37.4	37.2
GSM8K（5样本）	3.2	33.4	1.6

指令模型

指标	SmolLM2-360M-Instruct	Qwen2.5-0.5B-Instruct	SmolLM-360M-Instruct
IFEval（平均提示/指令）	41.0	31.6	19.8
MT-Bench	3.66	4.16	3.37
HellaSwag	52.1	48.0	47.9
ARC（平均）	43.7	37.3	38.8
PIQA	70.8	67.2	69.4
MMLU（完形填空）	32.8	31.7	30.6
BBH（3样本）	27.3	30.7	24.4
GSM8K（5样本）	7.43	26.8	1.36

局限性

SmolLM2 模型主要理解和生成英文内容。它们能够针对各种主题生成文本，但生成的内容可能并非始终事实准确、逻辑一致，也可能无法完全避免训练数据中存在的偏见。这些模型应作为辅助工具使用，而非权威的信息来源。用户应始终验证重要信息，并对生成的内容进行批判性评估。

训练

模型

架构： Transformer 解码器
预训练 token 数： 4T
精度： bfloat16

硬件

GPU： 64 张 H100

软件

训练框架： nanotron

许可证

Apache 2.0

引用

@misc{allal2024SmolLM2,
      title={SmolLM2 - with great data, comes great performance}, 
      author={Loubna Ben Allal and Anton Lozhkov and Elie Bakouch and Gabriel Martín Blázquez and Lewis Tunstall and Agustín Piqueres and Andres Marafioti and Cyril Zakka and Leandro von Werra and Thomas Wolf},
      year={2024},
}

SmolLM2

image/png

模型概述

SmolLM2 是一系列紧凑型语言模型，提供三种参数规模：135M、360M 和 1.7B。它们能够解决多种任务，同时足够轻量，可在设备端运行。

如何使用

在 Openmind 中使用

from openmind import AutoTokenizer, AutoModelForCausalLM, is_torch_npu_available
from openmind_hub import snapshot_download
import torch
import openmind
import argparse
import time

def generate_text(prompt, model, tokenizer, device):
    text_generator = openmind.pipeline(
        "text-generation",
        model=model,
        torch_dtype=torch.float16,
        device_map=device,
        tokenizer=tokenizer,
    )

    formatted_prompt = f"Question: {prompt} Answer:"

    sequences = text_generator(
        formatted_prompt,
        do_sample=True,
        top_k=5,
        top_p=0.9,
        num_return_sequences=1,
        repetition_penalty=1.5,
        max_new_tokens=128,
    )

    for seq in sequences:
        print(f"Result: {seq['generated_text']}")

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--model_name_or_path",
        type=str,
        help="Path to model",
        default="jeffding/SmolLM2-360M-Instruct-openmind",
    )
    args = parser.parse_args()
    return args

def main():
    args = parse_args()
    model_path = args.model_name_or_path

    if is_torch_npu_available():
        device = "npu:0"
    else:
        device = "cpu"
    tokenizer = AutoTokenizer.from_pretrained(model_path,trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained(model_path,trust_remote_code=True)
    model = model.to(device)
    
    start_time = time.time()
    
    # infer
    messages = [{"role": "user", "content": "What is the capital of France."}]
    input_text=tokenizer.apply_chat_template(messages, tokenize=False)
    print(input_text)
    inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
    outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
    print(tokenizer.decode(outputs[0]))

    
    end_time = time.time()
    print(f"硬件环境：{device},推理执行时间：{end_time - start_time}秒")
    
if __name__ == "__main__":
    main()

变形金刚

pip install transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM2-360M-Instruct"

device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

messages = [{"role": "user", "content": "What is the capital of France."}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
print(input_text)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))

在 TRL 中进行对话

您也可以使用 TRL 命令行界面在终端中与模型进行对话：

pip install trl
trl chat --model_name_or_path HuggingFaceTB/SmolLM2-360M-Instruct --device cpu

评估

在本节中，我们报告SmolLM2的评估结果。除非另有说明，所有评估均为零样本，并且我们使用lighteval来运行这些评估。

基础预训练模型

指标	SmolLM2-360M	Qwen2.5-0.5B	SmolLM-360M
HellaSwag	54.5	51.2	51.8
ARC（平均）	53.0	45.4	50.1
PIQA	71.7	69.9	71.6
MMLU（完形填空）	35.8	33.7	34.4
CommonsenseQA	38.0	31.6	35.3
TriviaQA	16.9	4.3	9.1
Winogrande	52.5	54.1	52.8
OpenBookQA	37.4	37.4	37.2
GSM8K（5样本）	3.2	33.4	1.6

指令模型

指标	SmolLM2-360M-Instruct	Qwen2.5-0.5B-Instruct	SmolLM-360M-Instruct
IFEval（平均提示/指令）	41.0	31.6	19.8
MT-Bench	3.66	4.16	3.37
HellaSwag	52.1	48.0	47.9
ARC（平均）	43.7	37.3	38.8
PIQA	70.8	67.2	69.4
MMLU（完形填空）	32.8	31.7	30.6
BBH（3样本）	27.3	30.7	24.4
GSM8K（5样本）	7.43	26.8	1.36

局限性

训练

模型

架构： Transformer 解码器
预训练 token 数： 4T
精度： bfloat16

硬件

GPU： 64 张 H100

软件

训练框架： nanotron

许可证

Apache 2.0

引用

@misc{allal2024SmolLM2,
      title={SmolLM2 - with great data, comes great performance}, 
      author={Loubna Ben Allal and Anton Lozhkov and Elie Bakouch and Gabriel Martín Blázquez and Lewis Tunstall and Agustín Piqueres and Andres Marafioti and Cyril Zakka and Leandro von Werra and Thomas Wolf},
      year={2024},
}

SmolLM2

目录

模型概述

如何使用

在 Openmind 中使用

变形金刚

在 TRL 中进行对话

评估

基础预训练模型

指令模型

局限性

训练

模型

硬件

软件

许可证

引用

SmolLM2

目录

模型概述

如何使用

在 Openmind 中使用

变形金刚

在 TRL 中进行对话

评估

基础预训练模型

指令模型

局限性

训练

模型

硬件

软件

许可证

引用