📰 技术报告
JoyAI-LLM Flash 是一款领先的中等规模指令语言模型,具有 30 亿激活参数和 480 亿总参数。该模型使用 Muon 优化器在 20 万亿文本 tokens 上进行预训练,随后通过大规模有监督微调(SFT)、直接偏好优化(DPO)以及在多种环境下的强化学习(RL)进一步优化。JoyAI-LLM Flash 在前沿知识、推理、编码任务和智能体能力方面均表现出色。
| 架构 | 混合专家模型(Mixture-of-Experts, MoE) |
| 总参数数量 | 480 亿 |
| 激活参数数量 | 30 亿 |
| 层数(包含稠密层) | 40 |
| 稠密层层数 | 1 |
| 注意力隐藏维度 | 2048 |
| MoE 隐藏维度(每专家) | 768 |
| 注意力头数量 | 32 |
| 专家数量 | 256 |
| 每 token 选择专家数 | 8 |
| 共享专家数量 | 1 |
| 词汇表大小 | 129K |
| 上下文长度 | 128K |
| 注意力机制 | MLA |
| 激活函数 | SwiGLU |
| 基准测试 | JoyAI-LLM Flash | Qwen3-30B-A3B-Instuct-2507 | GLM-4.7-Flash (Non-thinking) | ||||
|---|---|---|---|---|---|---|---|
| 知识与对齐 | |||||||
| MMLU | 89.50 | 86.87 | 80.53 | ||||
| MMLU-Pro | 81.02 | 73.88 | 63.62 | ||||
| CMMLU | 87.03 | 85.88 | 75.85 | ||||
| GPQA-Diamond | 74.43 | 68.69 | 39.90 | ||||
| SuperGPQA | 55.00 | 52.00 | 32.00 | ||||
| LiveBench | 72.90 | 59.70 | 43.10 | ||||
| IFEval | 86.69 | 83.18 | 82.44 | ||||
| AlignBench | 8.24 | 8.07 | 6.85 | ||||
| HellaSwag | 91.79 | 89.90 | 60.84 | ||||
| 代码能力 | |||||||
| HumanEval | 96.34 | 95.12 | 74.39 | ||||
| LiveCodeBench | 65.60 | 39.71 | 27.43 | ||||
| SciCode | 3.08/22.92 | 3.08/22.92 | 3.08/15.11 | ||||
| 数学能力 | |||||||
| GSM8K | 95.83 | 79.83 | 81.88 | ||||
| AIME2025 | 65.83 | 62.08 | 24.17 | ||||
| MATH 500 | 97.10 | 89.80 | 90.90 | ||||
| 智能体能力 | |||||||
| SWE-bench Verified | 60.60 | 24.44 | 51.60 | ||||
| Tau2-Retail | 67.55 | 53.51 | 62.28 | ||||
| Tau2-Airline | 54.00 | 32.00 | 52.00 | ||||
| Tau2-Telecom | 79.83 | 4.39 | 88.60 | ||||
| 长文本理解 | |||||||
| RULER | 95.60 | 89.66 | 56.12 | ||||
[!Note] 您可以通过 https://docs.jdcloud.com/cn/jdaip/chat 访问 JoyAI-LLM Flash API,我们为您提供与 OpenAI/Anthropic 兼容的 API。 当前,推荐在以下推理引擎上运行 JoyAI-LLM-Flash-FP8:
transformers 的最低版本要求为 4.57.1。
部署示例可在 模型部署指南 中找到。
以下使用演示展示了如何调用我们的官方 API。
对于使用 vLLM 或 SGLang 部署的第三方 API,请注意:
[!Note] 推荐的采样参数:
temperature=0.6,top_p=1.0
这是一个简单的对话补全脚本,展示了如何调用 JoyAI-Flash API。
from openai import OpenAI
client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY")
def simple_chat(client: OpenAI):
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "which one is bigger, 9.11 or 9.9? think
carefully.",
}
],
},
]
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
model=model_name, messages=messages, stream=False, max_tokens=4096
)
print(f"response: {response.choices[0].message.content}")
if __name__ == "__main__":
simple_chat(client)这是一个简单的工具调用补全脚本,展示了如何调用JoyAI-Flash API。
import json
from openai import OpenAI
client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY")
def my_calculator(expression: str) -> str:
return str(eval(expression))
def rewrite(expression: str) -> str:
return str(expression)
def simple_tool_call(client: OpenAI):
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "use my functions to compute the results for the
equations: 6+1",
},
],
},
]
tools = [
{
"type": "function",
"function": {
"name": "my_calculator",
"description": "A calculator that can evaluate a mathematical
equation and compute its results.",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "The mathematical expression to
evaluate.",
},
},
"required": ["expression"],
},
},
},
{
"type": "function",
"function": {
"name": "rewrite",
"description": "Rewrite a given text for improved clarity",
"parameters": {
"type": "object",
"properties": {
"text": {
"type": "string",
"description": "The input text to rewrite",
}
},
},
},
},
]
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
model=model_name,
messages=messages,
temperature=1.0,
max_tokens=1024,
tools=tools,
tool_choice="auto",
)
tool_calls = response.choices[0].message.tool_calls
results = []
for tool_call in tool_calls:
function_name = tool_call.function.name
function_args = tool_call.function.arguments
if function_name == "my_calculator":
result = my_calculator(**json.loads(function_args))
results.append(result)
messages.append({"role": "assistant", "tool_calls": tool_calls})
for tool_call, result in zip(tool_calls, results):
messages.append(
{
"role": "tool",
"tool_call_id": tool_call.id,
"name": tool_call.function.name,
"content": result,
}
)
response = client.chat.completions.create(
model=model_name,
messages=messages,
temperature=1.0,
max_tokens=1024,
)
print(response.choices[0].message.content)
if __name__ == "__main__":
simple_tool_call(client)
代码仓库和模型权重均基于Modified MIT License发布。