Ornith-1.0-9B

大家好！🌺 今天，我们正式发布 Ornith-1.0——一个具备自我提升能力的开源智能体编码模型系列。

核心亮点：

顶尖编码智能体：提供 9B-Dense、31B-Dense、35B-MoE 和 397B-MoE 版本（基于 Gemma 4 和 Qwen 3.5 进行后训练），在 Terminal-Bench 2.1、SWE-Bench、NL2Repo 和 OpenClaw 等编码基准测试中，取得了同规模开源模型中的顶尖性能。
自我提升训练框架：Ornith-1.0 采用强化学习技术，不仅学习生成解决方案流程，还能学习驱动这些流程的框架。通过联合优化框架与生成的解决方案，模型能够发现更优的搜索路径并生成更高质量的结果。
许可协议：采用 MIT 许可协议，全球可访问，无区域使用限制。

Ornith 1.0 9B

本模型卡片详细介绍 Ornith-1.0-9B，它是 Ornith 系列中最轻量的成员，专为高效单 GPU 部署设计。

基准测试

	Ornith-1.0-9B	Qwen3.5-9B	Qwen3.5-35B	Gemma4-12B	Gemma4-31B
智能体编码
Terminal-Bench 2.1 _(Terminus-2)	43.1	21.3	41.4	21	42.1
Terminal-Bench 2.1 _{(Claude Code)}	40.6	18.9	38.9	-	-
SWE-bench Verified	69.4	53.2	70	44.2	52
SWE-bench Pro	42.9	31.3	44.6	27.6	35.7
SWE-bench Multilingual	52	39.7	60.3	32.5	51.7
NL2Repo	27.2	16.2	20.5	10.3	15.5
Claw-eval Avg	63.1	53.2	65.4	32.5	48.5
SWE Atlas - QnA	17.9	9.2	13.2	-	-
SWE Atlas - RF	16.6	4.3	10.2	-	-
SWE Atlas - TW	15.3	4.4	9.8	-	-

* Terminal-Bench 2.1 (Terminus-2)：我们使用 Harbor/Terminus-2 框架评估 Terminal-Bench 2.1，参数设置为 parser=json，temperature=1.0，top_p=1.0，上下文窗口 128K。每次运行使用 4 小时超时限制，配备 32 个 CPU 核心和 48GB 内存，结果取 5 次运行的平均值。我们调整了 Qwen 对话模板以确保训练和推理的一致性（https://huggingface.co/deepreinforce-ai/Ornith-1.0-397B/blob/main/chat_template.jinja），并修改了 Harbor 以适配 vLLM 的 reasoning_content 键。
* Terminal-Bench 2.1 (Claude Code)：我们使用 Claude Code 2.1.126 评估 Terminal-Bench 2.1，参数设置为 parser=json，temperature=1.0，top_p=1.0，max_new_tokens=131072。结果取 5 次运行的平均值。同样需要修改 Qwen 对话模板。
* SWE-Bench Verified、Pro 和 Multilingual：使用 OpenHands 测试框架，参数设置为 temp=1.0，top_p=0.95，上下文窗口 256K。
* SWE Atlas QnA、RF、TW：使用 mini SWE agent 测试框架，参数设置为 temp=1.0，top_p=0.95，上下文窗口 128K。结果取 5 次运行的平均值。
* NL2Repo：参数设置为 temperature=1.0，top_p=1.0，上下文 400K，输出 48K，并启用反黑客过滤。
* ClawEval：基于真实用户任务分布的智能体编码基准测试；参数设置为 temp=0.6，上下文窗口 256K。

快速入门

📝 注意

Ornith-1.0-9B 是一个 推理模型：默认情况下，助手回复会以 <think> … </think> 块开头，随后才是最终答案。以下服务方案启用了推理解析器，因此思维链会在单独的 reasoning_content 字段中返回；同时启用了工具调用解析器，以便将模型的 <tool_call> 块转换为 OpenAI 风格的 tool_calls。

部署 Ornith-1.0-9B 需要使用较新版本的运行环境：

Transformers ≥ 5.8.1
vLLM ≥ 0.19.1
SGLang ≥ 0.5.9

推荐的采样参数：temperature=0.6，top_p=0.95，top_k=20（若要复现报告中的基准测试设置，请使用 temperature=1.0）。

部署 Ornith-1.0-9B

Ornith-1.0-9B 是一个约 90 亿参数的密集型模型（bf16 格式下约 19 GB），因此可在单张 80GB GPU 上流畅部署。以下方案可搭建兼容 OpenAI API 的服务；若需跨多张 GPU 进行分片部署，可添加 --tensor-parallel-size / --tp 参数。

vLLM

vllm serve deepreinforce-ai/Ornith-1.0-9B \
    --served-model-name Ornith-1.0-9B \
    --host 0.0.0.0 --port 8000 \
    --max-model-len 262144 \
    --gpu-memory-utilization 0.90 \
    --enable-prefix-caching \
    --enable-auto-tool-choice --tool-call-parser qwen3_xml \
    --reasoning-parser qwen3 \
    --trust-remote-code

SGLang

python -m sglang.launch_server \
    --model-path deepreinforce-ai/Ornith-1.0-9B \
    --served-model-name Ornith-1.0-9B \
    --host 0.0.0.0 --port 8000 \
    --context-length 262144 \
    --mem-fraction-static 0.85 \
    --tool-call-parser qwen3_coder \
    --reasoning-parser qwen3

Hugging Face Transformers

如需快速进行本地测试（或编写离线生成脚本），可直接使用Transformers加载模型。请确保已安装最新版本——详见Transformers安装指南；Ornith-1.0-9B要求transformers >= 5.8.1。

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "deepreinforce-ai/Ornith-1.0-9B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Write a Python function is_prime(n). Keep it short."}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = tokenizer(text, return_tensors="pt").to(model.device)
generated = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.6,
    top_p=0.95,
    top_k=20,
)
output_ids = generated[0][inputs.input_ids.shape[1]:]

# The reply contains a <think> ... </think> reasoning block followed by the answer.
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print(content)

为将推理过程与最终答案分离，请依据 </think> 标记进行解析：

text = tokenizer.decode(output_ids, skip_special_tokens=True)
if "</think>" in text:
    reasoning, answer = text.split("</think>", 1)
    reasoning = reasoning.replace("<think>", "").strip()
    answer = answer.strip()
else:
    reasoning, answer = "", text.strip()

通过聊天补全 API 使用 Ornith-1.0-9B

vLLM 或 SGLang 服务器启动后，即可通过任何兼容 OpenAI 的客户端与其交互。

基本用法

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="EMPTY",  # any non-empty string works for a local server
)

response = client.chat.completions.create(
    model="Ornith-1.0-9B",
    messages=[
        {"role": "user", "content": "Write a one-line Python lambda that squares a number."}
    ],
    temperature=0.6,
    top_p=0.95,
    max_tokens=1024,
)

message = response.choices[0].message
# reasoning_content holds the <think> trace; content holds the final answer.
print("reasoning:", getattr(message, "reasoning_content", None))
print("answer:", message.content)

您还可以流式传输 tokens，或者为模型提供工具——Ornith-1.0-9B 会生成格式规范的函数调用，服务器会将其解析为标准的 tool_calls 字段：

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"],
            },
        },
    }
]

response = client.chat.completions.create(
    model="Ornith-1.0-9B",
    messages=[{"role": "user", "content": "What is the weather in Paris right now?"}],
    tools=tools,
    tool_choice="auto",
    temperature=0.6,
    max_tokens=2048,
)

tool_call = response.choices[0].message.tool_calls[0]
print(tool_call.function.name, tool_call.function.arguments)
# -> get_weather {"city": "Paris"}

您可以将任何与 OpenAI 兼容的 SDK（Python、Node.js 等）或 curl 指向相同的 /v1/chat/completions 端点。

智能体使用

Ornith-1.0-9B 在工具调用和智能体编码能力方面表现出色。

智能体框架

由于 Ornith-1.0-9B 公开了一个支持工具调用的与 OpenAI 兼容的端点，因此它可以直接与标准智能体框架配合使用。以下是一个通过 MCP 服务器将 Ornith-1.0-9B 连接到工具的简单示例。

import os
from openai import OpenAI

client = OpenAI(
    base_url=os.getenv("OPENAI_BASE_URL", "http://localhost:8000/v1"),
    api_key=os.getenv("OPENAI_API_KEY", "EMPTY"),
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "run_shell",
            "description": "Run a shell command and return its output.",
            "parameters": {
                "type": "object",
                "properties": {
                    "command": {"type": "string", "description": "The command to run"}
                },
                "required": ["command"],
            },
        },
    }
]

messages = [{"role": "user", "content": "List the Python files in the current directory."}]

response = client.chat.completions.create(
    model="deepreinforce-ai/Ornith-1.0-9B",
    messages=messages,
    tools=tools,
    temperature=0.6,
    top_p=0.95,
)
print(response.choices[0].message)

Ornith 与智能体测试框架结合使用的示例：

Hermes 智能体

# Hermes talks to any OpenAI-compatible endpoint — point it at your Ornith server.
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"
export MODEL="deepreinforce-ai/Ornith-1.0-9B"

Atomic.chat / Ollama / llama.cpp

# Both runtimes load a GGUF build of Ornith (publish one at deepreinforce-ai/Ornith-1.0-9B-GGUF).

# llama.cpp — serve an OpenAI-compatible API on port 8000.
llama-server -hf deepreinforce-ai/Ornith-1.0-9B-GGUF --port 8000 -c 262144

# Ollama — pull and chat with the same GGUF straight from Hugging Face.
ollama run hf.co/deepreinforce-ai/Ornith-1.0-9B-GGUF

OpenClaw

# OpenClaw talks to any OpenAI-compatible endpoint — point it at your Ornith server.
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"
export OPENAI_MODEL="deepreinforce-ai/Ornith-1.0-9B"

Unsloth 工作室

pip install unsloth

# Load Ornith for fast local inference or fine-tuning (Python):
#   from unsloth import FastLanguageModel
#   model, tokenizer = FastLanguageModel.from_pretrained(
#       "deepreinforce-ai/Ornith-1.0-9B",
#       max_seq_length=262144,
#       load_in_4bit=True,
#   )

开放之手

pip install openhands-ai

# OpenHands routes through LiteLLM; the "openai/" prefix selects the OpenAI-compatible path.
export LLM_MODEL="openai/deepreinforce-ai/Ornith-1.0-9B"
export LLM_BASE_URL="http://localhost:8000/v1"
export LLM_API_KEY="EMPTY"

# Launch the CLI (or run the official OpenHands Docker image with the same env vars).
openhands

编码命令行界面

Ornith-1.0-9B 针对基于终端的编码代理进行了优化。将任何兼容 OpenAI 的编码命令行界面指向您的 Ornith-1.0-9B 端点（设置 OPENAI_BASE_URL 和 OPENAI_API_KEY），即可理解大型代码库、自动化繁琐工作并加快交付速度。

OpenCode

# Register your local Ornith endpoint as a provider in ~/.config/opencode/opencode.json:
#
# {
#   "$schema": "https://opencode.ai/config.json",
#   "provider": {
#     "ornith": {
#       "npm": "@ai-sdk/openai-compatible",
#       "name": "Ornith (local)",
#       "options": { "baseURL": "http://localhost:8000/v1", "apiKey": "EMPTY" },
#       "models": { "deepreinforce-ai/Ornith-1.0-9B": { "name": "Ornith-1.0-9B" } }
#     }
#   }
# }

opencode

引用

如果您觉得我们的工作有帮助，欢迎引用我们的成果。

@misc{ornith_9b,
    title = {{Ornith-1.0-9B}: Agentic Coding, Open to All},
    url = {https://deep-reinforce.com/ornith_1_0.html},
    author = {{DeepReinforce Team}},
    year = {2026}
}