
SmolLM2 是一系列紧凑型语言模型,提供三种参数规模:135M、360M 和 1.7B。它们能够解决多种任务,同时足够轻量,可在设备端运行。
SmolLM2 相比其前身 SmolLM1 有显著进步,尤其在指令遵循、知识掌握和推理能力方面。360M 模型在 4 万亿 tokens 上进行了训练,使用了多样化的数据集组合:FineWeb-Edu、DCLM、The Stack,以及我们策划的、即将发布的新过滤数据集。我们通过使用公开数据集和我们自己策划的数据集进行监督微调(SFT),开发了指令版本。随后,我们使用 UltraFeedback 应用了直接偏好优化(DPO)。
得益于 Argilla 开发的数据集,如 Synth-APIGen-v0.1,指令模型还支持文本改写、摘要生成和函数调用(针对 1.7B 版本)等任务。 您可以在此处找到 SFT 数据集:https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk,微调代码位于 [ali] https://github.com/huggingface/alignment-handbook/tree/main/recipes/smollm2
更多详情请参考:https://github.com/huggingface/smollm。您将找到预训练、后训练、评估和本地推理代码。
from openmind import AutoTokenizer, AutoModelForCausalLM, is_torch_npu_available
from openmind_hub import snapshot_download
import torch
import openmind
import argparse
import time
def generate_text(prompt, model, tokenizer, device):
text_generator = openmind.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map=device,
tokenizer=tokenizer,
)
formatted_prompt = f"Question: {prompt} Answer:"
sequences = text_generator(
formatted_prompt,
do_sample=True,
top_k=5,
top_p=0.9,
num_return_sequences=1,
repetition_penalty=1.5,
max_new_tokens=128,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
"--model_name_or_path",
type=str,
help="Path to model",
default="jeffding/SmolLM2-360M-Instruct-openmind",
)
args = parser.parse_args()
return args
def main():
args = parse_args()
model_path = args.model_name_or_path
if is_torch_npu_available():
device = "npu:0"
else:
device = "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_path,trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path,trust_remote_code=True)
model = model.to(device)
start_time = time.time()
# infer
messages = [{"role": "user", "content": "What is the capital of France."}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
print(input_text)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))
end_time = time.time()
print(f"硬件环境:{device},推理执行时间:{end_time - start_time}秒")
if __name__ == "__main__":
main()pip install transformersfrom transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM2-360M-Instruct"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
messages = [{"role": "user", "content": "What is the capital of France."}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
print(input_text)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))您也可以使用 TRL 命令行界面在终端中与模型进行对话:
pip install trl
trl chat --model_name_or_path HuggingFaceTB/SmolLM2-360M-Instruct --device cpu在本节中,我们报告SmolLM2的评估结果。除非另有说明,所有评估均为零样本,并且我们使用lighteval来运行这些评估。
| 指标 | SmolLM2-360M | Qwen2.5-0.5B | SmolLM-360M |
|---|---|---|---|
| HellaSwag | 54.5 | 51.2 | 51.8 |
| ARC(平均) | 53.0 | 45.4 | 50.1 |
| PIQA | 71.7 | 69.9 | 71.6 |
| MMLU(完形填空) | 35.8 | 33.7 | 34.4 |
| CommonsenseQA | 38.0 | 31.6 | 35.3 |
| TriviaQA | 16.9 | 4.3 | 9.1 |
| Winogrande | 52.5 | 54.1 | 52.8 |
| OpenBookQA | 37.4 | 37.4 | 37.2 |
| GSM8K(5样本) | 3.2 | 33.4 | 1.6 |
| 指标 | SmolLM2-360M-Instruct | Qwen2.5-0.5B-Instruct | SmolLM-360M-Instruct |
|---|---|---|---|
| IFEval(平均提示/指令) | 41.0 | 31.6 | 19.8 |
| MT-Bench | 3.66 | 4.16 | 3.37 |
| HellaSwag | 52.1 | 48.0 | 47.9 |
| ARC(平均) | 43.7 | 37.3 | 38.8 |
| PIQA | 70.8 | 67.2 | 69.4 |
| MMLU(完形填空) | 32.8 | 31.7 | 30.6 |
| BBH(3样本) | 27.3 | 30.7 | 24.4 |
| GSM8K(5样本) | 7.43 | 26.8 | 1.36 |
SmolLM2 模型主要理解和生成英文内容。它们能够针对各种主题生成文本,但生成的内容可能并非始终事实准确、逻辑一致,也可能无法完全避免训练数据中存在的偏见。这些模型应作为辅助工具使用,而非权威的信息来源。用户应始终验证重要信息,并对生成的内容进行批判性评估。
@misc{allal2024SmolLM2,
title={SmolLM2 - with great data, comes great performance},
author={Loubna Ben Allal and Anton Lozhkov and Elie Bakouch and Gabriel Martín Blázquez and Lewis Tunstall and Agustín Piqueres and Andres Marafioti and Cyril Zakka and Leandro von Werra and Thomas Wolf},
year={2024},
}