JoyAI-LLM-Flash-GGUF:可用于知识问答、推理、代码生成及智能体任务。该项目是中等规模指令语言模型，含30亿激活参数，采用MoE架构，支持工具使用与自主问题解决，具备FiberPO优化框架和高推理吞吐量。【此简介由AI生成】

1. 模型介绍

JoyAI-LLM-Flash 是一款先进的中型指令语言模型，拥有 30 亿激活参数和 480 亿总参数。该模型使用 Muon 优化器在 20 万亿文本 token 上进行预训练，随后在多样化环境中进行了大规模监督微调（SFT）、直接偏好优化（DPO）和强化学习（RL）。JoyAI-LLM-Flash 在前沿知识、推理、编码任务以及智能体能力方面均表现出色。

核心特性

纤维丛强化学习（Fiber Bundle RL）：将纤维丛理论引入强化学习，提出了一种新颖的优化框架 FiberPO。该方法专为应对大规模和异构智能体训练的挑战而设计，能在复杂数据分布下提升稳定性和鲁棒性。
训练-推理协同优化：结合 Muon 优化器与密集 MTP 技术，开发了新颖的优化方法以解决模型规模扩大时的不稳定性问题，吞吐量达到非 MTP 版本的 1.3 至 1.7 倍。
智能体智能（Agentic Intelligence）：专为工具使用、推理和自主问题解决而设计。

2. 模型概要


Architecture	混合专家模型（Mixture-of-Experts, MoE）
Total Parameters	48B
Activated Parameters	3B
Number of Layers (Dense layer included)	40
Number of Dense Layers	1
Attention Hidden Dimension	2048
MoE Hidden Dimension (per Expert)	768
Number of Attention Heads	32
Number of Experts	256
Selected Experts per Token	8
Number of Shared Experts	1
Vocabulary Size	129K
Context Length	128K
Attention Mechanism	MLA
Activation Function	SwiGLU

3. 评估结果

基准测试	^{JoyAI-LLM Flash}	^{Qwen3-30B-A3B-Instuct-2507}	^{GLM-4.7-Flash (Non-thinking)}
知识与对齐
MMLU	89.50	86.87	80.53
MMLU-Pro	81.02	73.88	63.62
CMMLU	87.03	85.88	75.85
GPQA-Diamond	74.43	68.69	39.90
SuperGPQA	55.00	52.00	32.00
LiveBench	72.90	59.70	43.10
IFEval	86.69	83.18	82.44
AlignBench	8.24	8.07	6.85
HellaSwag	91.79	89.90	60.84
代码能力
HumanEval	96.34	95.12	74.39
LiveCodeBench	65.60	39.71	27.43
SciCode	3.08/22.92	3.08/22.92	3.08/15.11
数学能力
GSM8K	95.83	79.83	81.88
AIME2025	65.83	62.08	24.17
MATH 500	97.10	89.80	90.90
智能体能力
SWE-bench Verified	60.60	24.44	51.60
Tau2-Retail	67.55	53.51	62.28
Tau2-Airline	54.00	32.00	52.00
Tau2-Telecom	79.83	4.39	88.60
长文本理解
RULER	95.60	89.66	56.12

4. 部署

[!Note] 您可以通过 https://docs.jdcloud.com/cn/jdaip/chat 访问 JoyAI-LLM Flash API，我们为您提供与 OpenAI/Anthropic 兼容的 API。当前，推荐在以下推理引擎上运行 JoyAI-LLM-Flash-GGUF：

Llama.cpp
Ollama

5. 模型使用

以下使用示例演示了如何调用我们的官方 API。

对于使用 vLLM 或 SGLang 部署的第三方 API，请注意：

[!Note] 推荐的采样参数：temperature=0.6，top_p=1.0

对话补全

这是一个简单的对话补全脚本，展示了如何调用 JoyAI-Flash API。

from openai import OpenAI

client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY")


def simple_chat(client: OpenAI):
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "which one is bigger, 9.11 or 9.9? think carefully.",
                }
            ],
        },
    ]
    model_name = client.models.list().data[0].id
    response = client.chat.completions.create(
        model=model_name, messages=messages, stream=False, max_tokens=4096
    )
    print(f"response: {response.choices[0].message.content}")


if __name__ == "__main__":
    simple_chat(client)

工具调用补全

这是一个简单的工具调用补全脚本，展示了如何调用 JoyAI-Flash API。

import json

from openai import OpenAI

client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY")


def my_calculator(expression: str) -> str:
    return str(eval(expression))


def rewrite(expression: str) -> str:
    return str(expression)


def simple_tool_call(client: OpenAI):
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "use my functions to compute the results for the equations: 6+1",
                },
            ],
        },
    ]
    tools = [
        {
            "type": "function",
            "function": {
                "name": "my_calculator",
                "description": "A calculator that can evaluate a mathematical equation and compute its results.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "expression": {
                            "type": "string",
                            "description": "The mathematical expression to evaluate.",
                        },
                    },
                    "required": ["expression"],
                },
            },
        },
        {
            "type": "function",
            "function": {
                "name": "rewrite",
                "description": "Rewrite a given text for improved clarity",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "text": {
                            "type": "string",
                            "description": "The input text to rewrite",
                        }
                    },
                },
            },
        },
    ]
    model_name = client.models.list().data[0].id
    response = client.chat.completions.create(
        model=model_name,
        messages=messages,
        temperature=1.0,
        max_tokens=1024,
        tools=tools,
        tool_choice="auto",
    )
    tool_calls = response.choices[0].message.tool_calls

    results = []
    for tool_call in tool_calls:
        function_name = tool_call.function.name
        function_args = tool_call.function.arguments
        if function_name == "my_calculator":
            result = my_calculator(**json.loads(function_args))
            results.append(result)
    messages.append({"role": "assistant", "tool_calls": tool_calls})
    for tool_call, result in zip(tool_calls, results):
        messages.append(
            {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "name": tool_call.function.name,
                "content": result,
            }
        )
    response = client.chat.completions.create(
        model=model_name,
        messages=messages,
        temperature=1.0,
        max_tokens=1024,
    )
    print(response.choices[0].message.content)


if __name__ == "__main__":
    simple_tool_call(client)

6. 许可协议

代码仓库和模型权重均遵循 Modified MIT License 进行发布。