JoyAI-LLM-Flash-FP8:可用于知识问答、推理、编码及智能体任务，是一个中规模指令语言模型，采用MoE架构，具备FiberPO优化、训练推理协同及工具使用能力，支持FP8精度部署。【此简介由AI生成】

1. 模型介绍

JoyAI-LLM Flash 是一款领先的中等规模指令语言模型，具有 30 亿激活参数和 480 亿总参数。该模型使用 Muon 优化器在 20 万亿文本 tokens 上进行预训练，随后通过大规模有监督微调（SFT）、直接偏好优化（DPO）以及在多种环境下的强化学习（RL）进一步优化。JoyAI-LLM Flash 在前沿知识、推理、编码任务和智能体能力方面均表现出色。

核心特性

纤维丛策略优化（Fibration Policy Optimization）：将纤维丛理论引入强化学习，提出了一种新颖的优化框架 FiberPO。该方法专为应对大规模和异构智能体训练的挑战而设计，在复杂数据分布下提升了训练的稳定性和鲁棒性。论文链接

训推协同（Training-Inference Collaboration）：将 Muon 优化器与密集 MTP 相结合，开发了新颖的优化技术以解决模型扩展时的不稳定性问题，吞吐量达到非 MTP 版本的 1.3 至 1.7 倍。

智能体智能（Agentic Intelligence）：专为工具使用、推理和自主问题解决而设计。

2. 模型概要

架构混合专家模型（Mixture-of-Experts, MoE）
总参数数量 480 亿
激活参数数量 30 亿
层数（包含稠密层） 40
稠密层层数 1
注意力隐藏维度 2048
MoE 隐藏维度（每专家） 768
注意力头数量 32
专家数量 256
每 token 选择专家数 8
共享专家数量 1
词汇表大小 129K
上下文长度 128K
注意力机制 MLA
激活函数 SwiGLU

3. 评估结果

基准测试 ^{JoyAI-LLM Flash} ^{Qwen3-30B-A3B-Instuct-2507} ^{GLM-4.7-Flash
(Non-thinking)}
知识与对齐
MMLU 89.50 86.87 80.53
MMLU-Pro 81.02 73.88 63.62
CMMLU 87.03 85.88 75.85
GPQA-Diamond 74.43 68.69 39.90
SuperGPQA 55.00 52.00 32.00
LiveBench 72.90 59.70 43.10
IFEval 86.69 83.18 82.44
AlignBench 8.24 8.07 6.85
HellaSwag 91.79 89.90 60.84
代码能力
HumanEval 96.34 95.12 74.39
LiveCodeBench 65.60 39.71 27.43
SciCode 3.08/22.92 3.08/22.92 3.08/15.11
数学能力
GSM8K 95.83 79.83 81.88
AIME2025 65.83 62.08 24.17
MATH 500 97.10 89.80 90.90
智能体能力
SWE-bench Verified 60.60 24.44 51.60
Tau2-Retail 67.55 53.51 62.28
Tau2-Airline 54.00 32.00 52.00
Tau2-Telecom 79.83 4.39 88.60
长文本理解
RULER 95.60 89.66 56.12

4. 部署

[!Note] 您可以通过 https://docs.jdcloud.com/cn/jdaip/chat 访问 JoyAI-LLM Flash API，我们为您提供与 OpenAI/Anthropic 兼容的 API。当前，推荐在以下推理引擎上运行 JoyAI-LLM-Flash-FP8：

vLLM

SGLang

transformers 的最低版本要求为 4.57.1。

部署示例可在模型部署指南中找到。

5. 模型使用

以下使用演示展示了如何调用我们的官方 API。

对于使用 vLLM 或 SGLang 部署的第三方 API，请注意：

[!Note] 推荐的采样参数：temperature=0.6，top_p=1.0

对话补全

这是一个简单的对话补全脚本，展示了如何调用 JoyAI-Flash API。

from openai import OpenAI client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY") def simple_chat(client: OpenAI): messages = [ { "role": "user", "content": [ { "type": "text", "text": "which one is bigger, 9.11 or 9.9? think carefully.", } ], }, ] model_name = client.models.list().data[0].id response = client.chat.completions.create( model=model_name, messages=messages, stream=False, max_tokens=4096 ) print(f"response: {response.choices[0].message.content}") if __name__ == "__main__": simple_chat(client)

工具调用补全

这是一个简单的工具调用补全脚本，展示了如何调用JoyAI-Flash API。

import json from openai import OpenAI client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY") def my_calculator(expression: str) -> str: return str(eval(expression)) def rewrite(expression: str) -> str: return str(expression) def simple_tool_call(client: OpenAI): messages = [ { "role": "user", "content": [ { "type": "text", "text": "use my functions to compute the results for the equations: 6+1", }, ], }, ] tools = [ { "type": "function", "function": { "name": "my_calculator", "description": "A calculator that can evaluate a mathematical equation and compute its results.", "parameters": { "type": "object", "properties": { "expression": { "type": "string", "description": "The mathematical expression to evaluate.", }, }, "required": ["expression"], }, }, }, { "type": "function", "function": { "name": "rewrite", "description": "Rewrite a given text for improved clarity", "parameters": { "type": "object", "properties": { "text": { "type": "string", "description": "The input text to rewrite", } }, }, }, }, ] model_name = client.models.list().data[0].id response = client.chat.completions.create( model=model_name, messages=messages, temperature=1.0, max_tokens=1024, tools=tools, tool_choice="auto", ) tool_calls = response.choices[0].message.tool_calls results = [] for tool_call in tool_calls: function_name = tool_call.function.name function_args = tool_call.function.arguments if function_name == "my_calculator": result = my_calculator(**json.loads(function_args)) results.append(result) messages.append({"role": "assistant", "tool_calls": tool_calls}) for tool_call, result in zip(tool_calls, results): messages.append( { "role": "tool", "tool_call_id": tool_call.id, "name": tool_call.function.name, "content": result, } ) response = client.chat.completions.create( model=model_name, messages=messages, temperature=1.0, max_tokens=1024, ) print(response.choices[0].message.content) if __name__ == "__main__": simple_tool_call(client)

6. 许可证

代码仓库和模型权重均基于Modified MIT License发布。


架构	混合专家模型（Mixture-of-Experts, MoE）
总参数数量	480 亿
激活参数数量	30 亿
层数（包含稠密层）	40
稠密层层数	1
注意力隐藏维度	2048
MoE 隐藏维度（每专家）	768
注意力头数量	32
专家数量	256
每 token 选择专家数	8
共享专家数量	1
词汇表大小	129K
上下文长度	128K
注意力机制	MLA
激活函数	SwiGLU

基准测试	^{JoyAI-LLM Flash}	^{Qwen3-30B-A3B-Instuct-2507}	^{GLM-4.7-Flash (Non-thinking)}
知识与对齐
MMLU	89.50	86.87	80.53
MMLU-Pro	81.02	73.88	63.62
CMMLU	87.03	85.88	75.85
GPQA-Diamond	74.43	68.69	39.90
SuperGPQA	55.00	52.00	32.00
LiveBench	72.90	59.70	43.10
IFEval	86.69	83.18	82.44
AlignBench	8.24	8.07	6.85
HellaSwag	91.79	89.90	60.84
代码能力
HumanEval	96.34	95.12	74.39
LiveCodeBench	65.60	39.71	27.43
SciCode	3.08/22.92	3.08/22.92	3.08/15.11
数学能力
GSM8K	95.83	79.83	81.88
AIME2025	65.83	62.08	24.17
MATH 500	97.10	89.80	90.90
智能体能力
SWE-bench Verified	60.60	24.44	51.60
Tau2-Retail	67.55	53.51	62.28
Tau2-Airline	54.00	32.00	52.00
Tau2-Telecom	79.83	4.39	88.60
长文本理解
RULER	95.60	89.66	56.12

📰 技术报告