MiniMax-Text-01 是一款功能强大的语言模型,总参数量达4560亿,其中每token激活参数为459亿。为更好地释放模型的长上下文能力,MiniMax-Text-01 采用了混合架构,融合了 Lightning Attention、Softmax Attention 和混合专家(Mixture-of-Experts, MoE)技术。借助先进的并行策略和创新的计算-通信重叠方法(如 Linear Attention Sequence Parallelism Plus (LASP+)、变长环形注意力(varlen ring attention)、专家张量并行(Expert Tensor Parallel, ETP)等),MiniMax-Text-01 的训练上下文长度扩展至100万token,推理时可处理高达400万token的上下文。在各类学术基准测试中,MiniMax-Text-01 同样展现出顶级模型的性能水平。
MiniMax-Text-01 的架构简述如下:
| 任务 | GPT-4o (11-20) | Claude-3.5-Sonnet (10-22) | Gemini-1.5-Pro (002) | Gemini-2.0-Flash (exp) | Qwen2.5-72B-Inst. | DeepSeek-V3 | Llama-3.1-405B-Inst. | MiniMax-Text-01 |
|---|---|---|---|---|---|---|---|---|
| 通用能力 | ||||||||
| MMLU* | 85.7 | 88.3 | 86.8 | 86.5 | 86.1 | 88.5 | 88.6 | 88.5 |
| MMLU-Pro* | 74.4 | 78.0 | 75.8 | 76.4 | 71.1 | 75.9 | 73.3 | 75.7 |
| SimpleQA | 39.0 | 28.1 | 23.4 | 26.6 | 10.3 | 24.9 | 23.2 | 23.7 |
| C-SimpleQA | 64.6 | 56.8 | 59.4 | 63.3 | 52.2 | 64.8 | 54.7 | 67.4 |
| IFEval (avg) | 84.1 | 90.1 | 89.4 | 88.4 | 87.2 | 87.3 | 86.4 | 89.1 |
| Arena-Hard | 92.4 | 87.6 | 85.3 | 72.7 | 81.2 | 91.4 | 63.5 | 89.1 |
| 推理能力 | ||||||||
| GPQA* (diamond) | 46.0 | 65.0 | 59.1 | 62.1 | 49.0 | 59.1 | 50.7 | 54.4 |
| DROP* (F1) | 89.2 | 88.8 | 89.2 | 89.3 | 85.0 | 91.0 | 92.5 | 87.8 |
| 数学能力 | ||||||||
| GSM8k* | 95.6 | 96.9 | 95.2 | 95.4 | 95.8 | 96.7 | 96.7 | 94.8 |
| MATH* | 76.6 | 74.1 | 84.6 | 83.9 | 81.8 | 84.6 | 73.8 | 77.4 |
| 编程能力 | ||||||||
| MBPP + | 76.2 | 75.1 | 75.4 | 75.9 | 77.0 | 78.8 | 73.0 | 71.7 |
| HumanEval | 90.2 | 93.7 | 86.6 | 89.6 | 86.6 | 92.1 | 89.0 | 86.9 |
* 评估采用 0-shot CoT 设置。
| 模型 | 4k | 8k | 16k | 32k | 64k | 128k | 256k | 512k | 1M |
|---|---|---|---|---|---|---|---|---|---|
| GPT-4o (11-20) | 0.970 | 0.921 | 0.890 | 0.888 | 0.884 | - | - | - | - |
| Claude-3.5-Sonnet (10-22) | 0.965 | 0.960 | 0.957 | 0.950 | 0.952 | 0.938 | - | - | - |
| Gemini-1.5-Pro (002) | 0.962 | 0.960 | 0.960 | 0.958 | 0.938 | 0.917 | 0.916 | 0.861 | 0.850 |
| Gemini-2.0-Flash (exp) | 0.960 | 0.960 | 0.951 | 0.957 | 0.937 | 0.860 | 0.797 | 0.709 | - |
| MiniMax-Text-01 | 0.963 | 0.961 | 0.953 | 0.954 | 0.943 | 0.947 | 0.945 | 0.928 | 0.910 |
| 模型 | 综合得分 | 简单任务 | 困难任务 | 短文本 | 中等文本 | 长文本 |
|---|---|---|---|---|---|---|
| 人类 | 53.7 | 100.0 | 25.1 | 47.2 | 59.1 | 53.7 |
| 使用思维链(CoT) | ||||||
| GPT-4o (11-20) | 51.4 | 54.2 | 49.7 | 59.6 | 48.6 | 43.5 |
| Claude-3.5-Sonnet (10-22) | 46.7 | 55.2 | 41.5 | 53.9 | 41.9 | 44.4 |
| Deepseek-V3 | - | - | - | - | - | - |
| Qwen2.5-72B-Inst. | 43.5 | 47.9 | 40.8 | 48.9 | 40.9 | 39.8 |
| MiniMax-Text-01 | 56.5 | 66.1 | 50.5 | 61.7 | 56.7 | 47.2 |
| 不使用思维链(CoT) | ||||||
| GPT-4o (11-20) | 50.1 | 57.4 | 45.6 | 53.3 | 52.4 | 40.2 |
| Claude-3.5-Sonnet (10-22) | 41.0 | 46.9 | 37.3 | 46.1 | 38.6 | 37.0 |
| Deepseek-V3 | 48.7 | - | - | - | - | - |
| Qwen2.5-72B-Inst. | 42.1 | 42.7 | 41.8 | 45.6 | 38.1 | 44.4 |
| MiniMax-Text-01 | 52.9 | 60.9 | 47.9 | 58.9 | 52.6 | 43.5 |
| 语境类型 | 无语境 | 半书长度 | 全书长度 | 半书长度提升值 | 全书长度提升值 |
|---|---|---|---|---|---|
| 英语→卡拉姆语(ChrF指标) | |||||
| GPT-4o (11-20) | 9.90 | 54.30 | - | 44.40 | - |
| Claude-3.5-Sonnet (10-22) | 20.22 | 53.62 | 55.65 | 33.39 | 35.42 |
| Gemini-1.5-Pro (002) | 16.79 | 53.68 | 57.90 | 36.89 | 41.11 |
| Gemini-2.0-Flash (exp) | 12.20 | 49.50 | 53.30 | 37.30 | 41.10 |
| Qwen-Long | 16.55 | 48.48 | 45.94 | 31.92 | 29.39 |
| MiniMax-Text-01 | 6.0 | 51.74 | 51.60 | 45.7 | 45.6 |
| 卡拉姆语→英语(BLEURT指标) | |||||
| GPT-4o (11-20) | 33.20 | 58.30 | - | 25.10 | - |
| Claude-3.5-Sonnet (10-22) | 31.42 | 59.70 | 62.30 | 28.28 | 30.88 |
| Gemini-1.5-Pro (002) | 32.02 | 61.52 | 63.09 | 29.50 | 31.07 |
| Gemini-2.0-Flash (exp) | 33.80 | 57.50 | 57.00 | 23.70 | 23.20 |
| Qwen-Long | 30.13 | 53.14 | 32.15 | 23.01 | 2.02 |
| MiniMax-Text-01 | 33.65 | 57.10 | 58.00 | 23.45 | 24.35 |
这里提供一个加载分词器和模型以生成内容的简单示例。
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, QuantoConfig, GenerationConfig
# load hf config
hf_config = AutoConfig.from_pretrained("MiniMaxAI/MiniMax-Text-01", trust_remote_code=True)
# quantization config, int8 is recommended
quantization_config = QuantoConfig(
weights="int8",
modules_to_not_convert=[
"lm_head",
"embed_tokens",
] + [f"model.layers.{i}.coefficient" for i in range(hf_config.num_hidden_layers)]
+ [f"model.layers.{i}.block_sparse_moe.gate" for i in range(hf_config.num_hidden_layers)]
)
# assume 8 GPUs
world_size = 8
layers_per_device = hf_config.num_hidden_layers // world_size
# set device map
device_map = {
'model.embed_tokens': 'cuda:0',
'model.norm': f'cuda:{world_size - 1}',
'lm_head': f'cuda:{world_size - 1}'
}
for i in range(world_size):
for j in range(layers_per_device):
device_map[f'model.layers.{i * layers_per_device + j}'] = f'cuda:{i}'
# load tokenizer
tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-Text-01")
prompt = "Hello!"
messages = [
{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by MiniMax based on MiniMax-Text-01 model."}]},
{"role": "user", "content": [{"type": "text", "text": prompt}]},
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
# tokenize and move to device
model_inputs = tokenizer(text, return_tensors="pt").to("cuda")
# load bfloat16 model, move to device, and apply quantization
quantized_model = AutoModelForCausalLM.from_pretrained(
"MiniMaxAI/MiniMax-Text-01",
torch_dtype="bfloat16",
device_map=device_map,
quantization_config=quantization_config,
trust_remote_code=True,
offload_buffers=True,
)
# generate response
generation_config = GenerationConfig(
max_new_tokens=20,
eos_token_id=200020,
use_cache=True,
)
generated_ids = quantized_model.generate(**model_inputs, generation_config=generation_config)
print(f"generated_ids: {generated_ids}")
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]在生产环境部署时,我们建议使用 vLLM 来提供 MiniMax-Text-01 的服务。vLLM 在大语言模型服务方面表现卓越,具备以下特性:
🔥 出色的服务吞吐量性能
⚡ 高效智能的内存管理
📦 强大的批量请求处理能力
⚙️ 深度优化的底层性能
有关详细的部署说明,请参考我们的 vLLM 部署指南。
MiniMax-Text-01 支持函数调用功能,能让模型智能识别何时需要调用外部函数,并以结构化 JSON 格式输出参数。借助函数调用,您可以:
@misc{minimax2025minimax01scalingfoundationmodels,
title={MiniMax-01: Scaling Foundation Models with Lightning Attention},
author={MiniMax and Aonian Li and Bangwei Gong and Bo Yang and Boji Shan and Chang Liu and Cheng Zhu and Chunhao Zhang and Congchao Guo and Da Chen and Dong Li and Enwei Jiao and Gengxin Li and Guojun Zhang and Haohai Sun and Houze Dong and Jiadai Zhu and Jiaqi Zhuang and Jiayuan Song and Jin Zhu and Jingtao Han and Jingyang Li and Junbin Xie and Junhao Xu and Junjie Yan and Kaishun Zhang and Kecheng Xiao and Kexi Kang and Le Han and Leyang Wang and Lianfei Yu and Liheng Feng and Lin Zheng and Linbo Chai and Long Xing and Meizhi Ju and Mingyuan Chi and Mozhi Zhang and Peikai Huang and Pengcheng Niu and Pengfei Li and Pengyu Zhao and Qi Yang and Qidi Xu and Qiexiang Wang and Qin Wang and Qiuhui Li and Ruitao Leng and Shengmin Shi and Shuqi Yu and Sichen Li and Songquan Zhu and Tao Huang and Tianrun Liang and Weigao Sun and Weixuan Sun and Weiyu Cheng and Wenkai Li and Xiangjun Song and Xiao Su and Xiaodong Han and Xinjie Zhang and Xinzhu Hou and Xu Min and Xun Zou and Xuyang Shen and Yan Gong and Yingjie Zhu and Yipeng Zhou and Yiran Zhong and Yongyi Hu and Yuanxiang Fan and Yue Yu and Yufeng Yang and Yuhao Li and Yunan Huang and Yunji Li and Yunpeng Huang and Yunzhi Xu and Yuxin Mao and Zehan Li and Zekang Li and Zewei Tao and Zewen Ying and Zhaoyang Cong and Zhen Qin and Zhenhua Fan and Zhihang Yu and Zhuo Jiang and Zijia Wu},
year={2025},
eprint={2501.08313},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.08313},
}为满足通用使用和评估需求,我们提供了具备在线搜索功能的Chatbot,以及面向开发者的在线API。此外,为方便开发者使用,我们还提供了MiniMax MCP Server,该服务支持视频生成、图像生成、语音合成及声音克隆功能。
如有任何问题,请通过model@minimaxi.com与我们联系。