🤗 huggingchat | 📰 技术博客
Kimi K2.6 是一款开源的原生多模态智能体模型,在长程编码、编码驱动设计、主动自主执行以及群体任务编排等实用能力方面实现了显著提升。
| 架构 | 混合专家模型(Mixture-of-Experts, MoE) |
| 总参数数量 | 1T |
| 激活参数数量 | 32B |
| 层数(包含密集层) | 61 |
| 密集层层数 | 1 |
| 注意力隐藏维度 | 7168 |
| MoE 隐藏维度(每个专家) | 2048 |
| 注意力头数量 | 64 |
| 专家数量 | 384 |
| 每令牌选择专家数 | 8 |
| 共享专家数量 | 1 |
| 词汇表大小 | 160K |
| 上下文长度 | 256K |
| 注意力机制 | MLA |
| 激活函数 | SwiGLU |
| 视觉编码器 | MoonViT |
| 视觉编码器参数 | 400M |
| 基准测试 | Kimi K2.6 | GPT-5.4 (xhigh) | Claude Opus 4.6 (max effort) | Gemini 3.1 Pro (thinking high) | Kimi K2.5 |
|---|---|---|---|---|---|
| 智能体能力 | |||||
| HLE-Full (使用工具) | 54.0 | 52.1 | 53.0 | 51.4 | 50.2 |
| BrowseComp | 83.2 | 82.7 | 83.7 | 85.9 | 74.9 |
| BrowseComp (智能体集群) | 86.3 | 78.4 | |||
| DeepSearchQA (f1 分数) | 92.5 | 78.6 | 91.3 | 81.9 | 89.0 |
| DeepSearchQA (准确率) | 83.0 | 63.7 | 80.6 | 60.2 | 77.1 |
| WideSearch (item-f1) | 80.8 | - | - | - | 72.7 |
| Toolathlon | 50.0 | 54.6 | 47.2 | 48.8 | 27.8 |
| MCPMark | 55.9 | 62.5* | 56.7* | 55.9* | 29.5 |
| Claw Eval (pass^3) | 62.3 | 60.3 | 70.4 | 57.8 | 52.3 |
| Claw Eval (pass@3) | 80.9 | 78.4 | 82.4 | 82.9 | 75.4 |
| APEX-Agents | 27.9 | 33.3 | 33.0 | 32.0 | 11.5 |
| OSWorld-Verified | 73.1 | 75.0 | 72.7 | - | 63.3 |
| 编码能力 | |||||
| Terminal-Bench 2.0 (Terminus-2) | 66.7 | 65.4* | 65.4 | 68.5 | 50.8 |
| SWE-Bench Pro | 58.6 | 57.7 | 53.4 | 54.2 | 50.7 |
| SWE-Bench Multilingual | 76.7 | - | 77.8 | 76.9* | 73.0 |
| SWE-Bench Verified | 80.2 | - | 80.8 | 80.6 | 76.8 |
| SciCode | 52.2 | 56.6 | 51.9 | 58.9 | 48.7 |
| OJBench (python) | 60.6 | - | 60.3 | 70.7 | 54.7 |
| LiveCodeBench (v6) | 89.6 | - | 88.8 | 91.7 | 85.0 |
| 推理与知识 | |||||
| HLE-Full | 34.7 | 39.8 | 40.0 | 44.4 | 30.1 |
| AIME 2026 | 96.4 | 99.2 | 96.7 | 98.3 | 95.8 |
| HMMT 2026 (Feb) | 92.7 | 97.7 | 96.2 | 94.7 | 87.1 |
| IMO-AnswerBench | 86.0 | 91.4 | 75.3 | 91.0* | 81.8 |
| GPQA-Diamond | 90.5 | 92.8 | 91.3 | 94.3 | 87.6 |
| 视觉能力 | |||||
| MMMU-Pro | 79.4 | 81.2 | 73.9 | 83.0* | 78.5 |
| MMMU-Pro (使用 python) | 80.1 | 82.1 | 77.3 | 85.3* | 77.7 |
| CharXiv (RQ) | 80.4 | 82.8* | 69.1 | 80.2* | 77.5 |
| CharXiv (RQ) (使用 python) | 86.7 | 90.0* | 84.7 | 89.9* | 78.7 |
| MathVision | 87.4 | 92.0* | 71.2* | 89.8* | 84.2 |
| MathVision (使用 python) | 93.2 | 96.1* | 84.6* | 95.7* | 85.0 |
| BabyVision | 39.8 | 49.7 | 14.8 | 51.6 | 36.5 |
| BabyVision (使用 python) | 68.5 | 80.2* | 38.4* | 68.3* | 40.5 |
| V* (使用 python) | 96.9 | 98.4* | 86.4* | 96.9* | 86.9 |
*)。除标有星号的情况外,所有其他结果均引自官方报告。Kimi-K2.6采用与Kimi-K2-Thinking相同的原生int4量化方法。
[!Note] 您可以通过 https://platform.moonshot.ai 访问 Kimi-K2.6 的 API,我们提供与 OpenAI/Anthropic 兼容的 API。为验证部署是否正确,我们还提供了 Kimi Vendor Verifier 工具。 目前,建议在以下推理引擎上运行 Kimi-K2.6:
Kimi-K2.6 与 Kimi-K2.5 架构相同,部署方法可直接复用。
transformers 的版本要求为 >=4.57.1, <5.0.0。
部署示例可参考 模型部署指南。
以下使用示例展示如何调用我们的官方 API。
对于使用 vLLM 或 SGLang 部署的第三方 API,请注意:
[!Note]
视频内容对话是一项实验性功能,目前仅在我们的官方 API 中支持。
Thinking 模式推荐的
temperature为1.0,Instant 模式推荐的temperature为0.6。推荐的
top_p为0.95。若要使用 Instant 模式,需在
extra_body中传入{'chat_template_kwargs': {"thinking": False}}。
以下是一个简单的对话补全脚本,展示如何在 Thinking 和 Instant 模式下调用 K2.6 API。
import openai
import base64
import requests
def simple_chat(client: openai.OpenAI, model_name: str):
messages = [
{'role': 'system', 'content': 'You are Kimi, an AI assistant created by Moonshot AI.'},
{
'role': 'user',
'content': [
{'type': 'text', 'text': 'which one is bigger, 9.11 or 9.9? think carefully.'}
],
},
]
response = client.chat.completions.create(
model=model_name, messages=messages, stream=False, max_tokens=4096
)
print('====== Below is reasoning content in Thinking Mode ======')
print(f'reasoning content: {response.choices[0].message.reasoning}')
print('====== Below is response in Thinking Mode ======')
print(f'response: {response.choices[0].message.content}')
# To use instant mode, pass {"thinking" = {"type":"disabled"}}
response = client.chat.completions.create(
model=model_name,
messages=messages,
stream=False,
max_tokens=4096,
extra_body={'thinking': {'type': 'disabled'}}, # this is for official API
# extra_body= {'chat_template_kwargs': {"thinking": False}} # this is for vLLM/SGLang
)
print('====== Below is response in Instant Mode ======')
print(f'response: {response.choices[0].message.content}')K2.6 支持图像和视频输入。
以下示例展示了如何使用图像输入调用 K2.6 API:
import openai
import base64
import requests
def chat_with_image(client: openai.OpenAI, model_name: str):
url = 'https://huggingface.co/moonshotai/Kimi-K2.6/resolve/main/figures/kimi-logo.png'
image_base64 = base64.b64encode(requests.get(url).content).decode()
messages = [
{
'role': 'user',
'content': [
{'type': 'text', 'text': 'Describe this image in detail.'},
{
'type': 'image_url',
'image_url': {'url': f'data:image/png;base64, {image_base64}'},
},
],
}
]
response = client.chat.completions.create(
model=model_name, messages=messages, stream=False, max_tokens=8192
)
print('====== Below is reasoning content in Thinking Mode ======')
print(f'reasoning content: {response.choices[0].message.reasoning}')
print('====== Below is response in Thinking Mode ======')
print(f'response: {response.choices[0].message.content}')
# Also support instant mode if you pass {"thinking" = {"type":"disabled"}}
response = client.chat.completions.create(
model=model_name,
messages=messages,
stream=False,
max_tokens=4096,
extra_body={'thinking': {'type': 'disabled'}}, # this is for official API
# extra_body= {'chat_template_kwargs': {"thinking": False}} # this is for vLLM/SGLang
)
print('====== Below is response in Instant Mode ======')
print(f'response: {response.choices[0].message.content}')
return response.choices[0].message.content以下示例展示了如何使用视频输入调用 K2.6 API:
import openai
import base64
import requests
def chat_with_video(client: openai.OpenAI, model_name:str):
url = 'https://huggingface.co/moonshotai/Kimi-K2.6/resolve/main/figures/demo_video.mp4'
video_base64 = base64.b64encode(requests.get(url).content).decode()
messages = [
{
"role": "user",
"content": [
{"type": "text","text": "Describe the video in detail."},
{
"type": "video_url",
"video_url": {"url": f"data:video/mp4;base64,{video_base64}"},
},
],
}
]
response = client.chat.completions.create(model=model_name, messages=messages)
print('====== Below is reasoning content in Thinking Mode ======')
print(f'reasoning content: {response.choices[0].message.reasoning}')
print('====== Below is response in Thinking Mode ======')
print(f'response: {response.choices[0].message.content}')
# Also support instant mode if pass {"thinking" = {"type":"disabled"}}
response = client.chat.completions.create(
model=model_name,
messages=messages,
stream=False,
max_tokens=4096,
extra_body={'thinking': {'type': 'disabled'}}, # this is for official API
# extra_body= {'chat_template_kwargs': {"thinking": False}} # this is for vLLM/SGLang
)
print('====== Below is response in Instant Mode ======')
print(f'response: {response.choices[0].message.content}')
return response.choices[0].message.contentKimi K2.6 支持 preserve_thinking 模式,该模式可在多轮交互中保留完整的推理内容,并提升编码智能体场景下的性能。
此功能默认处于关闭状态。以下示例展示了如何在 preserve_thinking 模式下调用 K2.6 API:
def chat_with_preserve_thinking(client: openai.OpenAI, model_name: str):
messages = [
{
"role": "user",
"content": "Tell me three random numbers."
},
{
"role": "assistant",
"reasoning_content": "I'll start by listing five numbers: 473, 921, 235, 215, 222, and I'll tell you the first three.",
"content": "473, 921, 235"
},
{
"role": "user",
"content": "What are the other two numbers you have in mind?"
}
]
response = client.chat.completions.create(
model=model_name,
messages=messages,
stream=False,
max_tokens=4096,
extra_body={'thinking': {'type': 'enabled', 'keep': 'all'}}, # this is for official API
# extra_body={"chat_template_kwargs": {"thinking":True, "preserve_thinking": True}}, # this is for vLLM/SGLang
# We recommend enabling preserve_thinking only in think mode.
)
# the assistant should mention 215 and 222 that appear in the prior reasoning content
print(f"response: {response.choices[0].message.reasoning}")
return response.choices[0].message.content
K2.6 沿用了 K2 Thinking 中交错思维与多步工具调用的设计。使用示例请参考 K2 Thinking 文档。
Kimi K2.6 与 Kimi Code CLI 作为其智能体框架配合使用时效果最佳,欢迎访问 https://www.kimi.com/code 体验。
代码仓库和模型权重均基于 Modified MIT License 发布。
如有任何问题,请通过 support@moonshot.ai 与我们联系。