OLMo 2 32B DPO 2025年3月版是 OLMo-2 32B 2025年3月版 模型的后训练变体。该模型已在 Tülu 3 数据集 的 OLMo 专用变体上进行了监督微调,并在 此数据集 上进一步完成了 DPO 训练。 Tülu 3 旨在除聊天外,在 MATH、GSM8K 和 IFEval 等多种任务上实现最先进的性能。 更多详情,请查阅 OLMo 2 论文 或 Tülu 3 论文!
OLMo 是一系列开源语言模型(Open Language Models),旨在推动语言模型科学的发展。 这些模型在 Dolma 数据集上训练而成。我们公开发布所有代码、检查点、日志以及相关的训练细节。
OLMo 2 将在 Transformers 的下一版本中得到支持,您需要通过以下方式从主分支安装:
pip install --upgrade git+https://github.com/huggingface/transformers.git要通过 HuggingFace 加载模型,请使用以下代码片段:
from transformers import AutoModelForCausalLM
olmo_model = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-0325-32B-DPO")注意:由于配置的微小变化,这与之前的 OLMo 2 和 Tülu 3 模型不同。它在其余内容之前没有 bos 令牌。我们的其他模型在聊天模板的开头有 <|endoftext|>。
我们模型的聊天模板格式如下:
<|user|>\nHow are you doing?\n<|assistant|>\nI'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>或者使用换行符展开:
<|user|>
How are you doing?
<|assistant|>
I'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>它也嵌入在分词器中,用于 tokenizer.apply_chat_template。
在 Ai2 演示中,我们默认使用以下系统提示:
You are OLMo 2, a helpful and harmless AI Assistant built by the Allen Institute for AI.该模型在训练时未预设特定的系统提示。
OLMo-2 系列模型的安全训练有限,且不像 ChatGPT 那样在部署时自动进行响应的实时过滤,因此模型可能会生成有问题的输出(尤其是在被明确提示的情况下)。 有关这一点的示例,可参考 Falcon 180B 的模型卡片。
| 模型 | 平均值 | AlpacaEval 2 LC | BBH | DROP | GSM8k | IFEval | MATH | MMLU | Safety | PopQA | TruthQA |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 闭源 API 模型 | |||||||||||
| GPT-3.5 Turbo 0125 | 59.6 | 38.7 | 66.6 | 70.2 | 74.3 | 66.9 | 41.2 | 70.2 | 69.1 | 45.0 | 62.9 |
| GPT 4o Mini 2024-07-18 | 65.7 | 49.7 | 65.9 | 36.3 | 83.0 | 83.5 | 67.9 | 82.2 | 84.9 | 39.0 | 64.8 |
| 开源权重模型 | |||||||||||
| Mistral-Nemo-Instruct-2407 | 50.9 | 45.8 | 54.6 | 23.6 | 81.4 | 64.5 | 31.9 | 70.0 | 52.7 | 26.9 | 57.7 |
| Ministral-8B-Instruct | 52.1 | 31.4 | 56.2 | 56.2 | 80.0 | 56.4 | 40.0 | 68.5 | 56.2 | 20.2 | 55.5 |
| Gemma-2-27b-it | 61.3 | 49.0 | 72.7 | 67.5 | 80.7 | 63.2 | 35.1 | 70.7 | 75.9 | 33.9 | 64.6 |
| Qwen2.5-32B | 66.5 | 39.1 | 82.3 | 48.3 | 87.5 | 82.4 | 77.9 | 84.7 | 82.4 | 26.1 | 70.6 |
| Mistral-Small-24B | 67.6 | 43.2 | 80.1 | 78.5 | 87.2 | 77.3 | 65.9 | 83.7 | 66.5 | 24.4 | 68.1 |
| Llama-3.1-70B | 70.0 | 32.9 | 83.0 | 77.0 | 94.5 | 88.0 | 56.2 | 85.2 | 76.4 | 46.5 | 66.8 |
| Llama-3.3-70B | 73.0 | 36.5 | 85.8 | 78.0 | 93.6 | 90.8 | 71.8 | 85.9 | 70.4 | 48.2 | 66.1 |
| Gemma-3-27b-it | - | 63.4 | 83.7 | 69.2 | 91.1 | - | - | 81.8 | - | 30.9 | - |
| 完全开源模型 | |||||||||||
| OLMo-2-7B-1124-Instruct | 55.7 | 31.0 | 48.5 | 58.9 | 85.2 | 75.6 | 31.3 | 63.9 | 81.2 | 24.6 | 56.3 |
| OLMo-2-13B-1124-Instruct | 61.4 | 37.5 | 58.4 | 72.1 | 87.4 | 80.4 | 39.7 | 68.6 | 77.5 | 28.8 | 63.9 |
| OLMo-2-32B-0325-SFT | 61.7 | 16.9 | 69.7 | 77.2 | 78.4 | 72.4 | 35.9 | 76.1 | 93.8 | 35.4 | 61.3 |
| OLMo-2-32B-0325-DPO | 68.8 | 44.1 | 70.2 | 77.5 | 85.7 | 83.8 | 46.8 | 78.0 | 91.9 | 36.4 | 73.5 |
| OLMo-2-32B-0325-Instruct | 68.8 | 42.8 | 70.6 | 78.0 | 87.6 | 85.6 | 49.7 | 77.3 | 85.9 | 37.5 | 73.2 |
OLMo 2 根据 Apache 2.0 许可协议授权。 OLMo 2 旨在用于研究和教育用途。 欲了解更多信息,请参阅我们的负责任使用指南。 本模型使用包含第三方模型生成输出的混合数据集进行了微调,因此需遵守附加条款:Gemma 使用条款。
@article{olmo20242olmo2furious,
title={2 OLMo 2 Furious},
author={Team OLMo and Pete Walsh and Luca Soldaini and Dirk Groeneveld and Kyle Lo and Shane Arora and Akshita Bhagia and Yuling Gu and Shengyi Huang and Matt Jordan and Nathan Lambert and Dustin Schwenk and Oyvind Tafjord and Taira Anderson and David Atkinson and Faeze Brahman and Christopher Clark and Pradeep Dasigi and Nouha Dziri and Michal Guerquin and Hamish Ivison and Pang Wei Koh and Jiacheng Liu and Saumya Malik and William Merrill and Lester James V. Miranda and Jacob Morrison and Tyler Murray and Crystal Nam and Valentina Pyatkin and Aman Rangapur and Michael Schmitz and Sam Skjonsberg and David Wadden and Christopher Wilhelm and Michael Wilson and Luke Zettlemoyer and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi},
year={2024},
eprint={2501.00656},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.00656},
}