Kimi-K2.6:Kimi K2.6 是一款开源的原生多模态智能体模型，在长程编码、编码驱动设计、主动自主执行以及群体任务编排等实用能力方面实现了显著提升。

MoonshotAI/Kimi-K2.6

🤗 huggingchat | 📰 技术博客

1. 模型介绍

Kimi K2.6 是一款开源的原生多模态智能体模型，在长程编码、编码驱动设计、主动自主执行以及群体任务编排等实用能力方面实现了显著提升。

核心特性

长程编码能力：K2.6 在复杂的端到端编码任务上实现了显著提升，能够跨编程语言（Rust、Go、Python）以及前端、DevOps、性能优化等多个领域进行稳健泛化。
编码驱动设计：K2.6 能够将简单提示和视觉输入转化为可投入生产的界面和轻量级全栈工作流，生成结构化布局、交互元素和丰富动画，且具有精心设计的美学精度。
增强型智能体集群：可横向扩展至 300 个子智能体，执行 4,000 个协同步骤。K2.6 能动态将任务分解为并行的、领域专业化的子任务，在单次自主运行中即可交付从文档到网站再到电子表格的端到端输出。
主动式开放编排：在自主任务方面，K2.6 在支持持久化、全天候后台智能体方面表现出色，这些智能体能够主动管理日程、执行代码并编排跨平台操作，无需人工监督。

2. 模型概述


架构	混合专家模型（Mixture-of-Experts, MoE）
总参数数量	1T
激活参数数量	32B
层数（包含密集层）	61
密集层层数	1
注意力隐藏维度	7168
MoE 隐藏维度（每个专家）	2048
注意力头数量	64
专家数量	384
每令牌选择专家数	8
共享专家数量	1
词汇表大小	160K
上下文长度	256K
注意力机制	MLA
激活函数	SwiGLU
视觉编码器	MoonViT
视觉编码器参数	400M

3. 评估结果

基准测试	^{Kimi K2.6}	^{GPT-5.4 ^(xhigh)}	^{Claude Opus 4.6 ^{(max effort)}}	^{Gemini 3.1 Pro ^{(thinking high)}}	^{Kimi K2.5}
智能体能力
HLE-Full (使用工具)	54.0	52.1	53.0	51.4	50.2
BrowseComp	83.2	82.7	83.7	85.9	74.9
BrowseComp (智能体集群)	86.3	78.4
DeepSearchQA (f1 分数)	92.5	78.6	91.3	81.9	89.0
DeepSearchQA (准确率)	83.0	63.7	80.6	60.2	77.1
WideSearch (item-f1)	80.8	-	-	-	72.7
Toolathlon	50.0	54.6	47.2	48.8	27.8
MCPMark	55.9	62.5*	56.7*	55.9*	29.5
Claw Eval (pass^3)	62.3	60.3	70.4	57.8	52.3
Claw Eval (pass@3)	80.9	78.4	82.4	82.9	75.4
APEX-Agents	27.9	33.3	33.0	32.0	11.5
OSWorld-Verified	73.1	75.0	72.7	-	63.3
编码能力
Terminal-Bench 2.0 (Terminus-2)	66.7	65.4*	65.4	68.5	50.8
SWE-Bench Pro	58.6	57.7	53.4	54.2	50.7
SWE-Bench Multilingual	76.7	-	77.8	76.9*	73.0
SWE-Bench Verified	80.2	-	80.8	80.6	76.8
SciCode	52.2	56.6	51.9	58.9	48.7
OJBench (python)	60.6	-	60.3	70.7	54.7
LiveCodeBench (v6)	89.6	-	88.8	91.7	85.0
推理与知识
HLE-Full	34.7	39.8	40.0	44.4	30.1
AIME 2026	96.4	99.2	96.7	98.3	95.8
HMMT 2026 (Feb)	92.7	97.7	96.2	94.7	87.1
IMO-AnswerBench	86.0	91.4	75.3	91.0*	81.8
GPQA-Diamond	90.5	92.8	91.3	94.3	87.6
视觉能力
MMMU-Pro	79.4	81.2	73.9	83.0*	78.5
MMMU-Pro (使用 python)	80.1	82.1	77.3	85.3*	77.7
CharXiv (RQ)	80.4	82.8*	69.1	80.2*	77.5
CharXiv (RQ) (使用 python)	86.7	90.0*	84.7	89.9*	78.7
MathVision	87.4	92.0*	71.2*	89.8*	84.2
MathVision (使用 python)	93.2	96.1*	84.6*	95.7*	85.0
BabyVision	39.8	49.7	14.8	51.6	36.5
BabyVision (使用 python)	68.5	80.2*	38.4*	68.3*	40.5
V* (使用 python)	96.9	98.4*	86.4*	96.9*	86.9

脚注

通用测试详情
- 我们报告的 Kimi K2.6 和 Kimi K2.5 结果均启用思考模式，Claude Opus 4.6 采用最大努力模式，GPT-5.4 采用 xhigh 推理努力模式，Gemini 3.1 Pro 采用高思考级别。
- 除非另有说明，所有 Kimi K2.6 实验均在温度 = 1.0、top-p = 1.0 以及 262,144 令牌的上下文长度下进行。
- 没有公开可用分数的基准测试均在与 Kimi K2.6 相同的条件下重新评估，并标有星号（*）。除标有星号的情况外，所有其他结果均引自官方报告。
推理基准测试
- GPT-5.4 和 Claude 4.6 的 IMO-AnswerBench 分数来自 z.ai/blog/glm-5.1。
- 人类终极考试（HLE）和其他推理任务的评估最大生成长度为 98,304 令牌。默认情况下，我们报告 HLE 完整集的结果。对于纯文本子集，Kimi K2.6 在不使用工具时准确率为 36.4%，使用工具时为 55.5%。
工具增强型/智能体任务
- Kimi K2.6 在 HLE（使用工具）、BrowseComp、DeepSearchQA 和 WideSearch 任务中配备了搜索、代码解释器和网页浏览工具。
- 对于使用工具的 HLE-Full，最大生成长度为 262,144 令牌，每步限制为 49,152 令牌。我们采用简单的上下文管理策略：一旦上下文窗口超过阈值，仅保留最近一轮的工具相关消息。
- 对于 BrowseComp，我们报告的分数是在使用与 Kimi K2.5 和 DeepSeek-V3.2 相同的全部丢弃策略进行上下文管理的情况下获得的。
- 对于 DeepSearchQA，Kimi K2.6 测试未应用上下文管理，超过支持上下文长度的任务直接计为失败。Claude Opus 4.6、GPT-5.4 和 Gemini 3.1 Pro 在 DeepSearchQA 上的分数引自 Claude Opus 4.7 系统卡片。
- 对于 WideSearch，我们报告的是在“隐藏工具结果”上下文管理设置下的结果。一旦上下文窗口超过阈值，仅保留最近一轮的工具相关消息。
- 测试系统提示与 Kimi K2.5 技术报告中使用的提示相同。
- Claw Eval 评估使用 1.1 版本，每步最大令牌数 = 16384。
- 对于 APEX-Agents，我们评估了公开发布的 480 个任务中的 452 个任务，方法与 Artificial Analysis 相同（排除了具有外部运行时依赖的 Investment Banking Worlds 244 和 246）。
编码任务
- Terminal-Bench 2.0 分数是在默认智能体框架（Terminus-2）和提供的 JSON 解析器下，以保留思考模式运行获得的。
- 对于 SWE-Bench 系列评估（包括 Verified、Multilingual 和 Pro），我们使用了基于 SWE-agent 改编的内部评估框架。该框架包含一组最小工具——bash 工具、createfile 工具、insert 工具、view 工具、strreplace 工具和 submit 工具。
- 所有报告的编码任务分数均为 10 次独立运行的平均值。
视觉基准测试
- 最大令牌数 = 98,304，三次运行的平均值（avg@3）。
- 使用 Python 工具的设置，每步最大令牌数 = 65,536，多步推理的最大步数 = 50。
- MMMU-Pro 遵循官方协议，保留输入顺序并在图像前添加前缀。

4. 原生INT4量化

Kimi-K2.6采用与Kimi-K2-Thinking相同的原生int4量化方法。

5. 部署

[!Note] 您可以通过 https://platform.moonshot.ai 访问 Kimi-K2.6 的 API，我们提供与 OpenAI/Anthropic 兼容的 API。为验证部署是否正确，我们还提供了 Kimi Vendor Verifier 工具。目前，建议在以下推理引擎上运行 Kimi-K2.6：

vLLM
SGLang
KTransformers

Kimi-K2.6 与 Kimi-K2.5 架构相同，部署方法可直接复用。

transformers 的版本要求为 >=4.57.1, <5.0.0。

部署示例可参考模型部署指南。

6. 模型使用

以下使用示例展示如何调用我们的官方 API。

对于使用 vLLM 或 SGLang 部署的第三方 API，请注意：

[!Note]

视频内容对话是一项实验性功能，目前仅在我们的官方 API 中支持。

Thinking 模式推荐的 temperature 为 1.0，Instant 模式推荐的 temperature 为 0.6。

推荐的 top_p 为 0.95。

若要使用 Instant 模式，需在 extra_body 中传入 {'chat_template_kwargs': {"thinking": False}}。

对话补全

以下是一个简单的对话补全脚本，展示如何在 Thinking 和 Instant 模式下调用 K2.6 API。

import openai
import base64
import requests
def simple_chat(client: openai.OpenAI, model_name: str):
    messages = [
        {'role': 'system', 'content': 'You are Kimi, an AI assistant created by Moonshot AI.'},
        {
            'role': 'user',
            'content': [
                {'type': 'text', 'text': 'which one is bigger, 9.11 or 9.9? think carefully.'}
            ],
        },
    ]
    response = client.chat.completions.create(
        model=model_name, messages=messages, stream=False, max_tokens=4096
    )
    print('====== Below is reasoning content in Thinking Mode ======')
    print(f'reasoning content: {response.choices[0].message.reasoning}')
    print('====== Below is response in Thinking Mode ======')
    print(f'response: {response.choices[0].message.content}')

    # To use instant mode, pass {"thinking" = {"type":"disabled"}}
    response = client.chat.completions.create(
        model=model_name,
        messages=messages,
        stream=False,
        max_tokens=4096,
        extra_body={'thinking': {'type': 'disabled'}},  # this is for official API
        # extra_body= {'chat_template_kwargs': {"thinking": False}}  # this is for vLLM/SGLang
    )
    print('====== Below is response in Instant Mode ======')
    print(f'response: {response.choices[0].message.content}')

含视觉内容的对话补全

K2.6 支持图像和视频输入。

以下示例展示了如何使用图像输入调用 K2.6 API：

import openai
import base64
import requests

def chat_with_image(client: openai.OpenAI, model_name: str):
    url = 'https://huggingface.co/moonshotai/Kimi-K2.6/resolve/main/figures/kimi-logo.png'
    image_base64 = base64.b64encode(requests.get(url).content).decode()
    messages = [
        {
            'role': 'user',
            'content': [
                {'type': 'text', 'text': 'Describe this image in detail.'},
                {
                    'type': 'image_url',
                    'image_url': {'url': f'data:image/png;base64, {image_base64}'},
                },
            ],
        }
    ]

    response = client.chat.completions.create(
        model=model_name, messages=messages, stream=False, max_tokens=8192
    )
    print('====== Below is reasoning content in Thinking Mode ======')
    print(f'reasoning content: {response.choices[0].message.reasoning}')
    print('====== Below is response in Thinking Mode ======')
    print(f'response: {response.choices[0].message.content}')

    # Also support instant mode if you pass {"thinking" = {"type":"disabled"}}
    response = client.chat.completions.create(
        model=model_name,
        messages=messages,
        stream=False,
        max_tokens=4096,
        extra_body={'thinking': {'type': 'disabled'}},  # this is for official API
        # extra_body= {'chat_template_kwargs': {"thinking": False}}  # this is for vLLM/SGLang
    )
    print('====== Below is response in Instant Mode ======')
    print(f'response: {response.choices[0].message.content}')

    return response.choices[0].message.content

以下示例展示了如何使用视频输入调用 K2.6 API：

import openai
import base64
import requests

def chat_with_video(client: openai.OpenAI, model_name:str):
    url = 'https://huggingface.co/moonshotai/Kimi-K2.6/resolve/main/figures/demo_video.mp4'
    video_base64 = base64.b64encode(requests.get(url).content).decode()
    messages = [
        {
            "role": "user",
            "content": [
                {"type": "text","text": "Describe the video in detail."},
                {
                    "type": "video_url",
                    "video_url": {"url": f"data:video/mp4;base64,{video_base64}"},
                },
            ],
        }
    ]

    response = client.chat.completions.create(model=model_name, messages=messages)
    print('====== Below is reasoning content in Thinking Mode ======')
    print(f'reasoning content: {response.choices[0].message.reasoning}')
    print('====== Below is response in Thinking Mode ======')
    print(f'response: {response.choices[0].message.content}')

    # Also support instant mode if pass {"thinking" = {"type":"disabled"}}
    response = client.chat.completions.create(
        model=model_name,
        messages=messages,
        stream=False,
        max_tokens=4096,
        extra_body={'thinking': {'type': 'disabled'}},  # this is for official API
        # extra_body= {'chat_template_kwargs': {"thinking": False}}  # this is for vLLM/SGLang
    )
    print('====== Below is response in Instant Mode ======')
    print(f'response: {response.choices[0].message.content}')
    return response.choices[0].message.content

思维保留

Kimi K2.6 支持 preserve_thinking 模式，该模式可在多轮交互中保留完整的推理内容，并提升编码智能体场景下的性能。

此功能默认处于关闭状态。以下示例展示了如何在 preserve_thinking 模式下调用 K2.6 API：

def chat_with_preserve_thinking(client: openai.OpenAI, model_name: str):
    messages = [
        {
            "role": "user",
            "content": "Tell me three random numbers."
        },
        {
            "role": "assistant",
            "reasoning_content": "I'll start by listing five numbers: 473, 921, 235, 215, 222, and I'll tell you the first three.",
            "content": "473, 921, 235"
        },
        {
            "role": "user",
            "content": "What are the other two numbers you have in mind?"
        }
    ]

    response = client.chat.completions.create(
        model=model_name,
        messages=messages,
        stream=False,
        max_tokens=4096,
        extra_body={'thinking': {'type': 'enabled', 'keep': 'all'}},  # this is for official API
        # extra_body={"chat_template_kwargs": {"thinking":True, "preserve_thinking": True}},  # this is for vLLM/SGLang
        # We recommend enabling preserve_thinking only in think mode.
    )
    # the assistant should mention 215 and 222 that appear in the prior reasoning content
    print(f"response: {response.choices[0].message.reasoning}")
    return response.choices[0].message.content

交错思维与多步工具调用

K2.6 沿用了 K2 Thinking 中交错思维与多步工具调用的设计。使用示例请参考 K2 Thinking 文档。

编码智能体框架

Kimi K2.6 与 Kimi Code CLI 作为其智能体框架配合使用时效果最佳，欢迎访问 https://www.kimi.com/code 体验。

7. 许可证

代码仓库和模型权重均基于 Modified MIT License 发布。

8. 第三方声明

详见 THIRD PARTY NOTICES

9. 联系我们

如有任何问题，请通过 support@moonshot.ai 与我们联系。