DeepSeek-Prover-V1.5：利用证明助手反馈进行强化学习和蒙特卡洛树搜索

1. 引言

我们介绍了DeepSeek-Prover-V1.5，一个开源的语言模型，专为Lean 4定理证明设计，通过优化训练和推理过程来增强DeepSeek-Prover-V1。该模型在DeepSeekMath-Base上预训练，专注于形式化数学语言，然后使用从DeepSeek-Prover-V1派生的增强形式化定理证明数据集进行监督微调。通过从证明助手反馈中学习强化学习（RLPAF），模型进一步得到细化。除了DeepSeek-Prover-V1的单次完整证明生成方法外，我们还提出了RMaxTS，一种使用内在奖励驱动的探索策略的蒙特卡洛树搜索变体，以生成多样化的证明路径。DeepSeek-Prover-V1.5在高中级别miniF2F基准测试（63.5%）和本科级别ProofNet基准测试（25.3%）上取得了新的最先进结果。

2. 评估结果

	miniF2F-test	ProofNet
ReProver	26.5%	13.8%
GPT-f	36.6%	-
Hypertree Proof Search	41.0%	-
InternLM2-StepProver	54.5%	18.1%
DeepSeek-Prover-V1	50.0%	-
DeepSeek-Prover-V1.5-Base	42.2%	13.2%
DeepSeek-Prover-V1.5-SFT	57.4%	22.9%
DeepSeek-Prover-V1.5-RL	60.2%	22.6%
DeepSeek-Prover-V1.5-RL + RMaxTS	63.5%	25.3%

3. 模型下载

我们向公众发布了具有70亿参数的DeepSeek-Prover-V1.5，包括基础、SFT和RL模型。

模型	下载
DeepSeek-Prover-V1.5-Base	🤗 HuggingFace
DeepSeek-Prover-V1.5-SFT	🤗 HuggingFace
DeepSeek-Prover-V1.5-RL	🤗 HuggingFace

4. 许可证

此代码库遵循MIT许可证。DeepSeekMath模型的使用需遵守模型许可证。DeepSeekMath支持商业用途。

查看LICENSE-CODE和LICENSE-MODEL了解更多详情。

5. 引用

@article{xin2024deepseekproverv15harnessingproofassistant,
      title={DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search}, 
      author={Huajian Xin and Z. Z. Ren and Junxiao Song and Zhihong Shao and Wanjia Zhao and Haocheng Wang and Bo Liu and Liyue Zhang and Xuan Lu and Qiushi Du and Wenjun Gao and Qihao Zhu and Dejian Yang and Zhibin Gou and Z. F. Wu and Fuli Luo and Chong Ruan},
      year={2024},
      eprint={2408.08152},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2408.08152}, 
}

6. 联系方式

如果您有任何疑问，请提出一个问题或通过 service@deepseek.com 与我们联系。

1. 引言

2. 评估结果

	miniF2F-test	ProofNet
ReProver	26.5%	13.8%
GPT-f	36.6%	-
Hypertree Proof Search	41.0%	-
InternLM2-StepProver	54.5%	18.1%
DeepSeek-Prover-V1	50.0%	-
DeepSeek-Prover-V1.5-Base	42.2%	13.2%
DeepSeek-Prover-V1.5-SFT	57.4%	22.9%
DeepSeek-Prover-V1.5-RL	60.2%	22.6%
DeepSeek-Prover-V1.5-RL + RMaxTS	63.5%	25.3%

模型

下载

DeepSeek-Prover-V1.5-Base

🤗 HuggingFace

DeepSeek-Prover-V1.5-SFT

🤗 HuggingFace

DeepSeek-Prover-V1.5-RL

🤗 HuggingFace

5. 引用

@article{xin2024deepseekproverv15harnessingproofassistant,
      title={DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search}, 
      author={Huajian Xin and Z. Z. Ren and Junxiao Song and Zhihong Shao and Wanjia Zhao and Haocheng Wang and Bo Liu and Liyue Zhang and Xuan Lu and Qiushi Du and Wenjun Gao and Qihao Zhu and Dejian Yang and Zhibin Gou and Z. F. Wu and Fuli Luo and Chong Ruan},
      year={2024},
      eprint={2408.08152},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2408.08152}, 
}