MiniCPM3-4B:MiniCPM3-4B：第三代小型语言模型，性能超越Phi-3.5mini-Instruct与GPT-3.5-Turbo-0125，可处理无限上下文，功能强大，支持代码解释，适用于广泛场景。开源Apache-2.0协议，免费用于学术研究与商业应用。【此简介由AI生成】

MiniCPM 代码库 | MiniCPM 论文 | MiniCPM-V 代码库 | 欢迎加入我们的 Discord 和微信交流群

简介

MiniCPM3-4B 是 MiniCPM 系列的第三代模型。其综合性能超越了 Phi-3.5-mini-Instruct 和 GPT-3.5-Turbo-0125，可与众多最新的 7B~9B 模型相媲美。

与 MiniCPM1.0/MiniCPM2.0 相比，MiniCPM3-4B 具备更强大、更多元的技能组合，能够支持更广泛的通用场景。MiniCPM3-4B 支持函数调用及代码解释器功能。使用指南请参见进阶功能。

MiniCPM3-4B 拥有 32k 的上下文窗口。借助 LLMxMapReduce 技术，理论上可处理无限长文本，且无需占用大量内存。

使用方法

使用 Transformers 进行推理

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

path = "openbmb/MiniCPM3-4B"
device = "cuda"

tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)

messages = [
    {"role": "user", "content": "推荐5个北京的景点。"},
]
model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(device)

model_outputs = model.generate(
    model_inputs,
    max_new_tokens=1024,
    top_p=0.7,
    temperature=0.7
)

output_token_ids = [
    model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs))
]

responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0]
print(responses)

使用 vLLM 进行推理

目前，你需要安装我们的 vLLM 分支版本。

pip install git+https://github.com/OpenBMB/vllm.git@minicpm3

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

model_name = "openbmb/MiniCPM3-4B"
prompt = [{"role": "user", "content": "推荐5个北京的景点。"}]

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
input_text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)

llm = LLM(
    model=model_name,
    trust_remote_code=True,
    tensor_parallel_size=1
)
sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02)

outputs = llm.generate(prompts=input_text, sampling_params=sampling_params)

print(outputs[0].outputs[0].text)

评估结果

基准测试	Qwen2-7B-Instruct	GLM-4-9B-Chat	Gemma2-9B-it	Llama3.1-8B-Instruct	GPT-3.5-Turbo-0125	Phi-3.5-mini-Instruct(3.8B)	MiniCPM3-4B
英文
MMLU	70.5	72.4	72.6	69.4	69.2	68.4	67.2
BBH	64.9	76.3	65.2	67.8	70.3	68.6	70.2
MT-Bench	8.41	8.35	7.88	8.28	8.17	8.60	8.41
IFEVAL（提示严格准确率）	51.0	64.5	71.9	71.5	58.8	49.4	68.4
中文
CMMLU	80.9	71.5	59.5	55.8	54.5	46.9	73.3
CEVAL	77.2	75.6	56.7	55.2	52.8	46.1	73.6
AlignBench v1.1	7.10	6.61	7.10	5.68	5.82	5.73	6.74
FollowBench-zh（SSR）	63.0	56.4	57.0	50.6	64.6	58.1	66.8
数学
MATH	49.6	50.6	46.0	51.9	41.8	46.4	46.6
GSM8K	82.3	79.6	79.7	84.5	76.4	82.7	81.1
MathBench	63.4	59.4	45.8	54.3	48.9	54.9	65.6
代码
HumanEval+	70.1	67.1	61.6	62.8	66.5	68.9	68.3
MBPP+	57.1	62.2	64.3	55.3	71.4	55.8	63.2
LiveCodeBench v3	22.2	20.2	19.2	20.4	24.0	19.6	22.6
函数调用
BFCL v2	71.6	70.1	19.2	73.3	75.4	48.4	76.0
总体
平均值	65.3	65.0	57.9	60.8	61.0	57.2	66.3

声明

作为一款语言模型，MiniCPM3-4B 通过学习海量文本生成内容。
但它不具备理解能力，也无法表达个人观点或价值判断。
MiniCPM3-4B 生成的任何内容均不代表模型开发者的观点或立场。
因此，用户在使用 MiniCPM3-4B 生成的内容时，应自行负责对其进行评估和验证。

许可证

本仓库基于 Apache-2.0 许可证发布。
MiniCPM3-4B 模型权重的使用必须严格遵守 MiniCPM Model License.md。
MiniCPM3-4B 的模型及权重完全免费用于学术研究。填写 "问卷" 完成注册后，也可免费用于商业用途。

引用

@article{hu2024minicpm,
  title={MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies},
  author={Hu, Shengding and Tu, Yuge and Han, Xu and He, Chaoqun and Cui, Ganqu and Long, Xiang and Zheng, Zhi and Fang, Yewei and Huang, Yuxiang and Zhao, Weilin and others},
  journal={arXiv preprint arXiv:2404.06395},
  year={2024}
}