jd-opensource/JoyAI-LLM-Flash-INT8
模型介绍文件和版本Pull Requests讨论分析
下载使用量0
JoyAI-LLM Flash

Hugging Face License

1. 模型介绍

JoyAI-LLM-Flash 是一款先进的中型指令语言模型,具有 30 亿激活参数和 480 亿总参数。该模型使用 Muon 优化器在 20 万亿文本 tokens 上进行预训练,随后在多样化环境中进行了大规模监督微调(SFT)、直接偏好优化(DPO)和强化学习(RL)。JoyAI-LLM-Flash 在前沿知识、推理、编码任务以及智能体能力方面均表现出色。

核心特性

  • 纤维丛强化学习(Fiber Bundle RL):将纤维丛理论引入强化学习,提出了一种新颖的优化框架 FiberPO。该方法专为应对大规模和异构智能体训练的挑战而设计,在复杂数据分布下提高了稳定性和鲁棒性。
  • 训推协同(Training-Inference Collaboration):将 Muon 优化器与密集 MTP 相结合,开发了新颖的优化技术以解决扩展过程中的不稳定性问题,吞吐量达到非 MTP 版本的 1.3 至 1.7 倍。
  • 智能体智能(Agentic Intelligence):专为工具使用、推理和自主问题解决而设计。

2. 模型概览

架构(Architecture)混合专家模型(Mixture-of-Experts, MoE)
总参数(Total Parameters)480 亿
激活参数(Activated Parameters)30 亿
层数(包含密集层)(Number of Layers (Dense layer included))40
密集层数(Number of Dense Layers)1
注意力隐藏维度(Attention Hidden Dimension)2048
MoE 隐藏维度(每专家)(MoE Hidden Dimension (per Expert))768
注意力头数(Number of Attention Heads)32
专家数量(Number of Experts)256
每 Token 选择专家数(Selected Experts per Token)8
共享专家数量(Number of Shared Experts)1
词汇表大小(Vocabulary Size)129K
上下文长度(Context Length)128K
注意力机制(Attention Mechanism)MLA
激活函数(Activation Function)SwiGLU

3. 评估结果

基准测试JoyAI-LLM FlashQwen3-30B-A3B-Instuct-2507GLM-4.7-Flash
(Non-thinking)
知识与对齐
MMLU89.5086.8780.53
MMLU-Pro81.0273.8863.62
CMMLU87.0385.8875.85
GPQA-Diamond74.4368.6939.90
SuperGPQA55.0052.0032.00
LiveBench72.9059.7043.10
IFEval86.6983.1882.44
AlignBench8.248.076.85
HellaSwag91.7989.9060.84
代码能力
HumanEval96.3495.1274.39
LiveCodeBench65.6039.7127.43
SciCode3.08/22.923.08/22.923.08/15.11
数学能力
GSM8K95.8379.8381.88
AIME202565.8362.0824.17
MATH 50097.1089.8090.90
智能体能力
SWE-bench Verified60.6024.4451.60
Tau2-Retail67.5553.5162.28
Tau2-Airline54.0032.0052.00
Tau2-Telecom79.834.3988.60
长文本理解
RULER95.6089.6656.12

4. 部署

[!Note] 您可以通过 https://docs.jdcloud.com/cn/jdaip/chat 访问 JoyAI-LLM Flash API,我们为您提供与 OpenAI/Anthropic 兼容的 API。 目前,推荐在以下推理引擎上运行 JoyAI-LLM-Flash-Block-INT8:

  • SGLang

部署示例可在 模型部署指南 中找到。

5. 模型使用

以下使用演示展示了如何调用我们的官方 API。

对于使用 vLLM 或 SGLang 部署的第三方 API,请注意:

[!Note] 推荐的采样参数:temperature=0.6,top_p=1.0

对话补全

这是一个简单的对话补全脚本,展示了如何调用 JoyAI-Flash API。

from openai import OpenAI

client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY")


def simple_chat(client: OpenAI):
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "which one is bigger, 9.11 or 9.9? think
carefully.",
                }
            ],
        },
    ]
    model_name = client.models.list().data[0].id
    response = client.chat.completions.create(
        model=model_name, messages=messages, stream=False, max_tokens=4096
    )
    print(f"response: {response.choices[0].message.content}")


if __name__ == "__main__":
    simple_chat(client)

工具调用补全

这是一个简单的工具调用补全脚本,展示了如何调用 JoyAI-Flash API。

import json

from openai import OpenAI

client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY")


def my_calculator(expression: str) -> str:
    return str(eval(expression))


def rewrite(expression: str) -> str:
    return str(expression)


def simple_tool_call(client: OpenAI):
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "use my functions to compute the results for the
equations: 6+1",
                },
            ],
        },
    ]
    tools = [
        {
            "type": "function",
            "function": {
                "name": "my_calculator",
                "description": "A calculator that can evaluate a mathematical
equation and compute its results.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "expression": {
                            "type": "string",
                            "description": "The mathematical expression to
evaluate.",
                        },
                    },
                    "required": ["expression"],
                },
            },
        },
        {
            "type": "function",
            "function": {
                "name": "rewrite",
                "description": "Rewrite a given text for improved clarity",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "text": {
                            "type": "string",
                            "description": "The input text to rewrite",
                        }
                    },
                },
            },
        },
    ]
    model_name = client.models.list().data[0].id
    response = client.chat.completions.create(
        model=model_name,
        messages=messages,
        temperature=1.0,
        max_tokens=1024,
        tools=tools,
        tool_choice="auto",
    )
    tool_calls = response.choices[0].message.tool_calls

    results = []
    for tool_call in tool_calls:
        function_name = tool_call.function.name
        function_args = tool_call.function.arguments
        if function_name == "my_calculator":
            result = my_calculator(**json.loads(function_args))
            results.append(result)
    messages.append({"role": "assistant", "tool_calls": tool_calls})
    for tool_call, result in zip(tool_calls, results):
        messages.append(
            {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "name": tool_call.function.name,
                "content": result,
            }
        )
    response = client.chat.completions.create(
        model=model_name,
        messages=messages,
        temperature=1.0,
        max_tokens=1024,
    )
    print(response.choices[0].message.content)


if __name__ == "__main__":
    simple_tool_call(client)

6. 许可协议

代码仓库和模型权重均根据 Modified MIT License 进行发布。