🤗 Baichuan-M1-14B-Base • 🤗 Baichuan-M1-14B-Instruct • 📗 技术报告 • 💬 微信交流
Baichuan-14B-M1 是百川智能自主研发的业界首个开源医疗大语言模型。该模型在保持优秀通用能力的同时,在医疗领域展现出强大性能。在多数通用基准评测中,其表现与同规模模型相当;而在医疗场景下,性能则超越了五倍参数量级的模型。模型核心特点如下:
我们针对医疗领域进行了细致的数据收集与整合,具体包括:
我们创新性地采用了多阶段课程学习与对齐优化方法,通过以下两部分系统提升模型能力:
训练分为三个阶段,逐步优化模型的通用及医疗领域能力:
通过强化学习与成对数据优化,提升模型生成质量、逻辑推理能力及用户偏好对齐度:
这种多阶段与对齐优化相结合的方法,使模型在通用及医疗领域能力上均实现了卓越表现。
我们的评估涵盖所有主流基准测试,在开源和闭源评估中均取得优异指标,在保持强大通用性能的同时,展现出卓越的医疗场景能力。
| 类别 | 基准测试 | Baichuan-M1-14B-Instruct | Qwen2.5-14B-Instruct | Qwen2.5-72B-Instruct | claude-3.5-sonnet-20241022 | gpt-4o |
|---|---|---|---|---|---|---|
| 平均得分 | 72.23 | 65.39 | 70.51 | 74.85 | 75.00 | |
| 临床实践 | cmbclin | 77.40 | 71.51 | 75.36 | 78.37 | 75.36 |
| clinicalbench_diag | 70.90 | 68.85 | 72.23 | 75.00 | 73.05 | |
| clinicalbench_hos | 70.05 | 68.83 | 70.53 | 65.58 | 69.38 | |
| clinicalbench_treat | 56.38 | 55.03 | 57.30 | 64.03 | 59.35 | |
| rarearena_rdc | 81.80 | 66.40 | 76.20 | 89.60 | 88.40 | |
| rarearena_rds | 54.00 | 42.60 | 49.80 | 59.80 | 57.20 | |
| rarebench | 59.60 | 52.80 | 60.60 | 65.30 | 62.80 | |
| 资格考试 | cmexam | 80.10 | 77.70 | 82.70 | 77.50 | 78.00 |
| Pediatric Qualification Exam | 78.48 | 74.68 | 84.81 | 76.58 | 78.48 | |
| Internal Medicine Qualification Exam | 83.42 | 86.10 | 87.17 | 87.70 | 83.42 | |
| General Practice Qualification Exam | 87.07 | 88.44 | 88.44 | 81.63 | 84.35 | |
| USMLE | 78.00 | 67.20 | 76.70 | 85.90 | 87.10 | |
| medbullets | 66.88 | 54.22 | 64.29 | 72.40 | 75.97 | |
| mediq | 83.40 | 66.80 | 79.90 | 88.80 | 90.20 | |
| nejmqa | 49.75 | 45.69 | 50.76 | 69.54 | 54.31 | |
| pubmedqa | 75.20 | 76.40 | 75.60 | 77.00 | 77.60 | |
| redisqa | 74.50 | 69.70 | 75.00 | 83.20 | 82.80 | |
| 基础能力 | mednli_dis | 80.40 | 68.90 | 74.90 | 58.30 | 79.80 |
| medcalc | 56.00 | 31.40 | 37.90 | 52.60 | 49.00 | |
| MMLU-anatomy | 80.00 | 67.41 | 71.11 | 86.67 | 91.11 | |
| MMLU-virology | 54.82 | 56.02 | 53.01 | 54.22 | 57.23 | |
| MMLU-genetics | 91.00 | 82.00 | 87.00 | 97.00 | 95.00 | |
我们建议使用最新版本的 Transformers 库(至少 4.47.0 版本)。以下代码片段展示了如何使用 Baichuan-M1-14B-Instruct 模型:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# 1. Load pre-trained model and tokenizer
model_name = "baichuan-inc/Baichuan-M1-14B-Base"
tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name,trust_remote_code=True,torch_dtype = torch.bfloat16).cuda()
input_text = "I have recently recovered from my cold."
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(
inputs["input_ids"],
max_length=100,
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Generated Text:")
print(generated_text)模型的使用必须遵守《Baichuan-M1-14B模型社区许可协议》。
百川开发团队未基于本模型开发任何商业应用。所有用户必须遵守法律法规,不得将模型用于危害国家安全或非法目的。
如果您需要引用我们的工作,请使用以下参考文献格式:
@article{baichuan-m1-2025,
title={Baichuan-M1: Pushing the Medical Capability of Large Language Models},
author={Bingning Wang, Haizhou Zhao, Huozhi Zhou, Liang Song, Mingyu Xu, Wei Cheng, Xiangrong Zeng, Yupeng Zhang, Yuqi Huo, Zecheng Wang, Zhengyun Zhao and others},
journal={arXiv preprint arXiv:2502.12671},
year={2025}
}