OLMo-2-1124-7B-Instruct

注意：2025年1月3日更新：

在OLMo-2模型初始发布后，我们发现后训练模型并未采用与基础模型相同的预分词逻辑。因此，我们重新训练了新的后训练模型。新模型将使用与原模型相同的名称发布，而旧模型则会添加“-preview”后缀供用户访问。有关旧版模型的集合，请参见OLMo 2 Preview Post-trained Models。

发布文档

OLMo 2 7B Instruct November 2024（2024年11月版OLMo 2 7B指令模型）是OLMo-2 7B November 2024模型的后训练版本。该模型首先在Tülu 3数据集的OLMo专用变体 Tülu 3 dataset上进行了有监督微调（SFT），随后在此数据集上进行了直接偏好优化（DPO）训练，最后使用此数据完成了RLVR训练。 Tülu 3旨在实现除聊天外多种任务的最先进性能，例如MATH、GSM8K和IFEval。更多详情，请查阅OLMo 2论文或Tülu 3论文！

OLMo是一系列Open Language Models（开放语言模型），旨在推动语言模型的科学研究。这些模型在Dolma数据集上训练而成。我们将发布所有代码、检查点、日志（即将推出）以及相关的训练细节。本批次发布的核心模型包括：

阶段	OLMo 2 7B	OLMo 2 13B
基础模型	allenai/OLMo2-7B-1124	allenai/OLMo-2-13B-1124
有监督微调（SFT）	allenai/OLMo-2-1124-7B-SFT	allenai/OLMo-2-1124-13B-SFT
直接偏好优化（DPO）	allenai/OLMo-2-1124-7B-DPO	allenai/OLMo-2-1124-13B-DPO
最终模型（RLVR）	allenai/OLMo-2-1124-7B-Instruct	allenai/OLMo-2-1124-13B-Instruct
奖励模型（RM）	allenai/OLMo-2-1124-7B-RM	allenai/OLMo-2-1124-13B-RM

模型说明

模型类型： 基于公开可用数据、合成数据及人工创建数据混合训练的模型。
语言（自然语言处理）： 主要为英语
许可证： Apache 2.0
微调基础模型： allenai/OLMo-2-7B-1124-DPO

模型来源

项目页面： https://allenai.org/olmo
代码仓库：
- 核心仓库（训练、推理、微调等）：https://github.com/allenai/OLMo
- 评估代码：https://github.com/allenai/olmes
- 进一步微调代码：https://github.com/allenai/open-instruct
论文： https://arxiv.org/abs/2501.00656
演示： https://playground.allenai.org/

安装

OLMo 2 将在 Transformers 的下一个版本中得到支持，您需要通过以下方式从主分支安装：

pip install --upgrade git+https://github.com/huggingface/transformers.git

使用模型

通过 HuggingFace 加载

若要通过 HuggingFace 加载模型，请使用以下代码片段：

from transformers import AutoModelForCausalLM

olmo_model = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-7B-Instruct")

聊天模板

我们模型的聊天模板格式如下：

<|endoftext|><|user|>\nHow are you doing?\n<|assistant|>\nI'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>

或者使用展开的新行：

<|endoftext|><|user|>
How are you doing?
<|assistant|>
I'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>

它也嵌入在分词器中，用于 tokenizer.apply_chat_template。

系统提示

在 Ai2 演示中，我们默认使用以下系统提示：

You are OLMo 2, a helpful and harmless AI Assistant built by the Allen Institute for AI.

该模型在训练时未预设特定的系统提示。

偏见、风险与局限性

OLMo-2 模型的安全训练有限，且不像 ChatGPT 那样在响应生成过程中部署实时过滤机制，因此该模型可能会产生有问题的输出（尤其是在被提示这样做时）。有关此问题的示例，请参见 Falcon 180B 模型卡片。

性能

模型	平均值	AlpacaEval	BBH	DROP	GSM8k	IFEval	MATH	MMLU	Safety	PopQA	TruthQA
开放权重模型
Gemma-2-9B-it	51.9	43.7	2.5	58.8	79.7	69.9	29.8	69.1	75.5	28.3	61.4
Ministral-8B-Instruct	52.1	31.4	56.2	56.2	80.0	56.4	40.0	68.5	56.2	20.2	55.5
Mistral-Nemo-Instruct-2407	50.9	45.8	54.6	23.6	81.4	64.5	31.9	70.0	52.7	26.9	57.7
Qwen-2.5-7B-Instruct	57.1	29.7	25.3	54.4	83.8	74.7	69.9	76.6	75.0	18.1	63.1
Llama-3.1-8B-Instruct	58.9	25.8	69.7	61.7	83.4	80.6	42.5	71.3	70.2	28.4	55.1
Tülu 3 8B	60.4	34.0	66.0	62.6	87.6	82.4	43.7	68.2	75.4	29.1	55.0
Qwen-2.5-14B-Instruct	60.8	34.6	34.0	50.5	83.9	82.4	70.6	81.1	79.3	21.1	70.8
完全开放模型
OLMo-7B-Instruct	28.2	5.2	35.3	30.7	14.3	32.2	2.1	46.3	54.0	17.1	44.5
OLMo-7B-0424-Instruct	33.1	8.5	34.4	47.9	23.2	39.2	5.2	48.9	49.3	18.9	55.2
OLMoE-1B-7B-0924-Instruct	35.5	8.5	37.2	34.3	47.2	46.2	8.4	51.6	51.6	20.6	49.1
MAP-Neo-7B-Instruct	42.9	17.6	26.4	48.2	69.4	35.9	31.5	56.5	73.7	18.4	51.6
OLMo-2-7B-SFT	50.2	10.2	49.7	59.6	74.6	66.9	25.3	61.1	82.1	23.6	48.6
OLMo-2-7B-DPO	54.2	27.9	46.7	60.2	82.6	73.0	30.3	60.8	81.0	23.5	56.0
OLMo-2-13B-SFT	55.3	11.5	59.6	71.3	76.3	68.6	29.5	68.0	82.3	29.4	57.1
OLMo-2-13B-DPO	60.6	38.3	57.9	71.5	82.3	80.2	35.2	67.9	79.7	29.0	63.9
OLMo-2-7B-1124–Instruct	54.8	29.1	46.6	60.5	85.1	72.3	32.5	61.3	80.6	23.2	56.5
OLMo-2-13B-1124-Instruct	62.0	39.5	58.8	71.5	87.4	82.6	39.2	68.5	79.1	28.8	64.3

许可与使用

OLMo 2 采用 Apache 2.0 许可协议。 OLMo 2 旨在用于研究和教育用途。如需更多信息，请参阅我们的负责任使用指南。本模型使用包含第三方模型生成输出的混合数据集进行了微调，因此需遵守附加条款：Gemma 使用条款。

引用

@article{olmo20242olmo2furious,
      title={2 OLMo 2 Furious}, 
      author={Team OLMo and Pete Walsh and Luca Soldaini and Dirk Groeneveld and Kyle Lo and Shane Arora and Akshita Bhagia and Yuling Gu and Shengyi Huang and Matt Jordan and Nathan Lambert and Dustin Schwenk and Oyvind Tafjord and Taira Anderson and David Atkinson and Faeze Brahman and Christopher Clark and Pradeep Dasigi and Nouha Dziri and Michal Guerquin and Hamish Ivison and Pang Wei Koh and Jiacheng Liu and Saumya Malik and William Merrill and Lester James V. Miranda and Jacob Morrison and Tyler Murray and Crystal Nam and Valentina Pyatkin and Aman Rangapur and Michael Schmitz and Sam Skjonsberg and David Wadden and Christopher Wilhelm and Michael Wilson and Luke Zettlemoyer and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi},
      year={2024},
      eprint={2501.00656},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.00656}, 
}