HuggingFace镜像/MiroThinker-1.7-mini
模型介绍文件和版本分析
下载使用量0
MiroThinker

Agents Paper

Github Discord Website

简介

全新的MiroThinker系列在构建可靠的长链任务智能体方面实现了重大飞跃。通过增强的训练后处理流程,MiroThinker-1.7系列在开源模型的深度研究任务中达到了SOTA性能。

核心特性

  • MiroThinker-1.7支持256K上下文窗口、长程推理和深度多步骤分析。
  • 每个任务最多可处理300次工具调用,现在具备更精确的逐步推理和决策能力。
  • 提供30B和235B参数规模版本,并配备全面的工具套件和工作流,可灵活支持多样化的研究场景和计算预算。
  • 我们的专有智能体MiroThinker-H1为长链可验证推理提供了有力证据——推理过程具备步骤可验证性和全局可验证性,提升了复杂智能体工作流的性能。
模型名称参数规模最大上下文最大工具调用次数HF链接
MiroThinker-1.7-mini30B256K300🤗 链接
MiroThinker-1.7235B256K300🤗 链接

MiroThinker-1.7在广泛的基准测试中展现出强大的通用研究性能,在BrowseComp、BrowseComp-ZH、GAIA-Val-165和HLE-Text上的准确率分别达到74.0%、75.3%、82.7%和42.9%。MiroThinker-1.7在BrowseComp-ZH上实现了SOTA性能。

image

更多详情请参见我们的技术报告。

在线试用 MiroThinker

欢迎试用 MiroThinker,它提供的智能体通用问答体验优于 OpenAI DeepResearch。

[!IMPORTANT]

注意:本在线服务并非用于 BrowseComp 评估。为保证延迟和稳定性,每个查询限制 100 次工具调用。BrowseComp 涉及长周期任务,我们的智能体通常需要超过 200 次工具调用,这超出了本演示的范围。

性能表现

为防止潜在的信息泄露(例如从 HuggingFace 获取基准测试答案),我们在评估期间屏蔽了对特定网站的访问。

MiroThinker

快速开始

为获得最佳使用体验,建议将 MiroThinker 与我们支持工具的智能体框架配合使用,并启用思考模式。有关安装说明、示例和完整文档,请参阅我们的 GitHub 仓库:

👉 https://github.com/MiroMindAI/MiroThinker

本地部署

建议使用 SGLang 或 vLLM 部署智能体:

# SGLang
python -m sglang.launch_server --model-path miromind-ai/MiroThinker-1.7-mini --tp 8 --host 0.0.0.0 --port 1234
# vLLM
vllm serve miromind-ai/MiroThinker-1.7-mini --tensor-parallel-size 8 --max-model-len 262144 --enable-reasoning

为实现智能体任务的最佳性能,我们推荐以下推理参数:

temperature: 1.0
top_p: 0.95
repetition_penalty: 1.05
max_context_length: 262144
max_tokens: 16384

推荐的系统提示词

我们使用这种统一的 XML 包裹 JSON 格式来描述和组织所有工具。如果您有其他工具,请使用相同的结构和格式记录它们,以确保在整个环境中实现一致的解析、兼容性和最佳性能。

点击展开系统提示词示例
In this environment you have access to a set of tools you can use to answer the user's question.
You only have access to the tools provided below. You can only use one tool per message, and will receive the result of that tool in the user's next response. You use tools step-by-step to accomplish a given task, with each tool-use informed by the result of the previous tool-use. Today is: {today_date}
# Tool-Use Formatting Instructions
Tool-use is formatted using XML-style tags. The tool-use is enclosed in <use_mcp_tool></use_mcp_tool> and each parameter is similarly enclosed within its own set of tags.
The Model Context Protocol (MCP) connects to servers that provide additional tools and resources to extend your capabilities. You can use the server's tools via the `use_mcp_tool`.
Description:
Request to use a tool provided by a MCP server. Each MCP server can provide multiple tools with different capabilities. Tools have defined input schemas that specify required and optional parameters.
Parameters:
- server_name: (required) The name of the MCP server providing the tool
- tool_name: (required) The name of the tool to execute
- arguments: (required) A JSON object containing the tool's input parameters, following the tool's input schema, quotes within string must be properly escaped, ensure it's valid JSON
Usage:
<use_mcp_tool>
<server_name>server name here</server_name>
<tool_name>tool name here</tool_name>
<arguments>
{
  "param1": "value1",
  "param2": "value2 \"escaped string\""
}
</arguments>
</use_mcp_tool>
Important Notes:
- Tool-use must be placed **at the end** of your response, **top-level**, and not nested within other tags.
- Always adhere to this format for the tool use to ensure proper parsing and execution.
String and scalar parameters should be specified as is, while lists and objects should use JSON format. Note that spaces for string values are not stripped. The output is not expected to be valid XML and is parsed with regular expressions.
Here are the functions available in JSONSchema format:
## Server name: tool-python
### Tool name: create_sandbox
Description: Create a linux sandbox.
    Args:
        timeout: Time in seconds before the sandbox is automatically shutdown. The default is 600 seconds.
    Returns:
        The id of the newly created sandbox. You should use this sandbox_id to run other tools in the sandbox.
Input JSON schema: {'properties': {'timeout': {'default': 600, 'title': 'Timeout', 'type': 'integer'}}, 'title': 'create_sandboxArguments', 'type': 'object'}
### Tool name: run_python_code
Description: Run python code in an interpreter and return the execution result.
    Args:
        code_block: The python code to run.
        sandbox_id: The id of the sandbox to run the code in. Reuse existing sandboxes whenever possible. To create a new sandbox, use tool `create_sandbox`.
    Returns:
        A result of the command execution, format like (stderr=..., stdout=..., exit_code=..., error=...)
Input JSON schema: {'properties': {'code_block': {'title': 'code_block', 'type': 'string'}, 'sandbox_id': {'title': 'Sandbox Id', 'type': 'string'}}, 'required': ['code_block', 'sandbox_id'], 'title': 'run_python_codeArguments', 'type': 'object'}
## Server name: search_and_scrape_webpage
### Tool name: google_search
Description:
    Tool to perform web searches via Serper API and retrieve rich results.
    It is able to retrieve organic search results, people also ask,
    related searches, and knowledge graph.
    Args:
        q: Search query string
        gl: Optional region code for search results in ISO 3166-1 alpha-2 format (e.g., 'us')
        hl: Optional language code for search results in ISO 639-1 format (e.g., 'en')
        location: Optional location for search results (e.g., 'SoHo, New York, United States', 'California, United States')
        num: Number of results to return (default: 10)
        tbs: Time-based search filter ('qdr:h' for past hour, 'qdr:d' for past day, 'qdr:w' for past week, 'qdr:m' for past month, 'qdr:y' for past year)
        page: Page number of results to return (default: 1)
        autocorrect: Whether to autocorrect spelling in query
    Returns:
        Dictionary containing search results and metadata.
Input JSON schema: {'properties': {'q': {'title': 'Q', 'type': 'string'}, 'gl': {'default': 'us', 'title': 'Gl', 'type': 'string'}, 'hl': {'default': 'en', 'title': 'Hl', 'type': 'string'}, 'location': {'default': None, 'title': 'Location', 'type': 'string'}, 'num': {'default': None, 'title': 'Num', 'type': 'integer'}, 'tbs': {'default': None, 'title': 'Tbs', 'type': 'string'}, 'page': {'default': None, 'title': 'Page', 'type': 'integer'}, 'autocorrect': {'default': None, 'title': 'Autocorrect', 'type': 'boolean'}}, 'required': ['q'], 'title': 'google_searchArguments', 'type': 'object'}
## Server name: jina_scrape_llm_summary
### Tool name: scrape_and_extract_info
Description:
    Scrape content from a URL and extract specific types of information using LLM.
    Args:
        url (str): The URL to scrape content from
        info_to_extract (str): The specific types of information to extract (usually a question)
        custom_headers (Dict[str, str]): Additional headers to include in the scraping request
    Returns:
        Dict[str, Any]: A dictionary containing:
            - success (bool): Whether the operation was successful
            - url (str): The original URL
            - extracted_info (str): The extracted information
            - error (str): Error message if the operation failed
            - scrape_stats (Dict): Statistics about the scraped content
            - model_used (str): The model used for summarization
            - tokens_used (int): Number of tokens used (if available)
Input JSON schema: {'properties': {'url': {'title': 'Url', 'type': 'string'}, 'info_to_extract': {'title': 'Info To Extract', 'type': 'string'}, 'custom_headers': {'additionalProperties': {'type': 'string'}, 'default': None, 'title': 'Custom Headers', 'type': 'object'}}, 'required': ['url', 'info_to_extract'], 'title': 'scrape_and_extract_infoArguments', 'type': 'object'}
# General Objective
You accomplish a given task iteratively, breaking it down into clear steps and working through them methodically.

最小可运行示例

以下示例展示了如何运行 MCP 风格的工具调用工作流,包括系统提示生成、智能体调用、工具执行和最终响应生成。

运行脚本前,请确保设置所需的环境变量:

export OPENAI_API_KEY="your-api-key-here"
export BASE_URL="https://your-agent-endpoint.example.com/v1"
点击展开 python 代码示例
import json
import os
import inspect
import re
from openai import OpenAI
from json_repair import repair_json
def get_weather(location: str, unit: str = "celsius") -> str:
    """
    Get weather information for a specified location (simulated)
    
    Args:
        location: Location name
        unit: Temperature unit, either celsius or fahrenheit
    
    Returns:
        JSON string with weather information
    """
    weather_data = {
        "London": {"temperature": 15, "condition": "sunny", "humidity": 45},
        "New York": {"temperature": 20, "condition": "cloudy", "humidity": 60},
        "Tokyo": {"temperature": 25, "condition": "rainy", "humidity": 75},
    }
    weather = weather_data.get(location, {"temperature": 18, "condition": "unknown", "humidity": 50})
    if unit == "fahrenheit":
        weather["temperature"] = weather["temperature"] * 9/5 + 32
        weather["unit"] = "°F"
    else:
        weather["unit"] = "°C"
    return json.dumps(weather, ensure_ascii=False)
def calculate(expression: str) -> str:
    """
    Calculate a mathematical expression
    
    Args:
        expression: Mathematical expression, e.g., "2 + 3 * 4"
    
    Returns:
        Calculation result
    """
    try:
        result = eval(expression)
        return json.dumps({"result": result, "expression": expression}, ensure_ascii=False)
    except Exception as e:
        return json.dumps({"error": str(e)}, ensure_ascii=False)
tools = [
    {"type": "function", "function": {"name": "get_weather", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "Location name"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit, default is celsius"}}, "required": ["location"]}}},
    {"type": "function", "function": {"name": "calculate", "parameters": {"type": "object", "properties": {"expression": {"type": "string", "description": "Mathematical expression to calculate, e.g., '2 + 3 * 4'"}}, "required": ["expression"]}}}
]
available_functions = {"get_weather": get_weather, "calculate": calculate}
def parse_mcp_tool_call(response_text: str):
    """Parse MCP-style tool call from model response. Returns first tool call or None."""
    match = re.search(r'<use_mcp_tool>(.*?)</use_mcp_tool>', response_text, re.DOTALL)
    if not match:
        return None
    content = match.group(1)
    server_match = re.search(r'<server_name>(.*?)</server_name>', content, re.DOTALL)
    tool_match = re.search(r'<tool_name>(.*?)</tool_name>', content, re.DOTALL)
    args_match = re.search(r'<arguments>(.*?)</arguments>', content, re.DOTALL)
    server_name = server_match.group(1).strip() if server_match else None
    tool_name = tool_match.group(1).strip() if tool_match else None
    if args_match:
        try:
            arguments = json.loads(args_match.group(1).strip())
        except json.JSONDecodeError as e:
            print(f"⚠️  Warning: Failed to parse arguments JSON: {e}, attempting to repair...")
            try:
                repaired = repair_json(args_match.group(1).strip())
                arguments = json.loads(repaired)
                print(f"✅  Successfully repaired JSON")
            except Exception as repair_error:
                print(f"❌  Failed to repair JSON: {repair_error}")
                arguments = {}
    else:
        arguments = {}
    if server_name and tool_name:
        return {"server_name": server_name, "tool_name": tool_name, "arguments": arguments}
    return None
def generate_mcp_system_prompt(openai_tools: list, available_functions: dict = None, server_name: str = "default", date: str = "2025-11-27") -> str:
    """Generate MCP-style system prompt from OpenAI tools format."""
    prefix = f"""
In this environment you have access to a set of tools you can use to answer the user's question.
You only have access to the tools provided below. You can only use one tool per message, and will receive the result of that tool in the user's next response. You use tools step-by-step to accomplish a given task, with each tool-use informed by the result of the previous tool-use. Today is: {date}
# Tool-Use Formatting Instructions
Tool-use is formatted using XML-style tags. The tool-use is enclosed in <use_mcp_tool></use_mcp_tool> and each parameter is similarly enclosed within its own set of tags.
The Model Context Protocol (MCP) connects to servers that provide additional tools and resources to extend your capabilities. You can use the server's tools via the `use_mcp_tool`.
Description:
Request to use a tool provided by a MCP server. Each MCP server can provide multiple tools with different capabilities. Tools have defined input schemas that specify required and optional parameters.
Parameters:
- server_name: (required) The name of the MCP server providing the tool
- tool_name: (required) The name of the tool to execute
- arguments: (required) A JSON object containing the tool's input parameters, following the tool's input schema, quotes within string must be properly escaped, ensure it's valid JSON
Usage:
<use_mcp_tool>
<server_name>server name here</server_name>
<tool_name>tool name here</tool_name>
<arguments>
{{
  "param1": "value1",
  "param2": "value2 \\"escaped string\\""
}}
</arguments>
</use_mcp_tool>
Important Notes:
- Tool-use must be placed **at the end** of your response, **top-level**, and not nested within other tags.
- Always adhere to this format for the tool use to ensure proper parsing and execution.
String and scalar parameters should be specified as is, while lists and objects should use JSON format. Note that spaces for string values are not stripped. The output is not expected to be valid XML and is parsed with regular expressions.
Here are the functions available in JSONSchema format:
## Server name: {server_name}
"""
    tools_section = []
    for i, tool in enumerate(openai_tools):
        if tool.get("type") == "function":
            func = tool["function"]
            tool_name = func["name"]
            func_obj = available_functions[tool_name]
            full_description = inspect.getdoc(func_obj) or func.get("description", "")
            if i > 0:
                tools_section.append("\n")
            tools_section.append(f"### Tool name: {tool_name}\nDescription: {full_description}\n\nInput JSON schema: {json.dumps(func['parameters'], ensure_ascii=False)}\n")
    suffix = "\n# General Objective\n\nYou accomplish a given task iteratively, breaking it down into clear steps and working through them methodically."
    return prefix + ''.join(tools_section) + suffix
def run_conversation(user_query: str, model: str = "MiroThinker"):
    """Run a complete conversation with tool calling"""
    system_prompt = generate_mcp_system_prompt(openai_tools=tools, available_functions=available_functions, server_name="My-Tools", date="2025-12-01")
    client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "your-api-key-here"), base_url=os.environ.get("BASE_URL", "your-base-url-here"))
    print(f"\n{'='*60}\nUser Query: {user_query}\n{'='*60}\n")
    messages = [{'role': 'system', 'content': system_prompt}, {"role": "user", "content": user_query}]
    print("📤 Sending request to model...")
    response = client.chat.completions.create(model=model, messages=messages)
    response_message = response.choices[0].message
    response_content = response_message.content
    tool_call = parse_mcp_tool_call(response_content)
    print(f"📝 Model response:\n{response_content}\n")
    messages.append(response_message)
    if tool_call:
        server_name = tool_call["server_name"]
        tool_name = tool_call["tool_name"]
        function_args = tool_call["arguments"]
        print(f"\n🔧 Model decided to call tool:\n  - Server: {server_name}\n    Tool: {tool_name}\n    Args: {json.dumps(function_args, ensure_ascii=False)}")
        function_response = available_functions[tool_name](**function_args)
        print(f"    Result: {function_response}\n")
        messages.append({"role": "user", "content": function_response})
        print("📤 Requesting model to generate final response based on tool results...\n")
        second_response = client.chat.completions.create(model=model, messages=messages)
        final_message = second_response.choices[0].message.content
        print(f"💬 Final Response:\n{final_message}\n")
        return final_message
    else:
        print(f"💬 Model Response (no tool calls):\n{response_message.content}\n")
        return response_message.content
def main():
    """Run multiple examples"""
    run_conversation("What's the weather like in London?")
    # run_conversation("Calculate (25 + 15) * 3 - 10")
if __name__ == "__main__":
    main()

许可协议

MiroThinker-1.7 基于 Apache 2.0 许可协议发布。

引用说明

如果您在研究中发现本项目对您有所帮助,请考虑引用:

@article{miromind2025mirothinker,
  title={MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling},
  author={MiroMind Team and Bai, Song and Bing, Lidong and Chen, Carson and Chen, Guanzheng and Chen, Yuntao and Chen, Zhe and Chen, Ziyi and Dong, Xuan and others},
  journal={arXiv preprint arXiv:2511.11793},
  year={2025}
}

联系我们

MiroThinker 由 MiroMind AI 团队开发。 如果您想给我们留言,欢迎随时联系。 除了 GitHub、 Discord 外, 您也可以通过邮箱 service@miromind.ai 与我们取得联系。