/ˈɑː.pri.əl/
Apriel-1.6-15B-Thinker 是 ServiceNow Apriel SLM 系列中的更新版多模态推理模型,基于 Apriel-1.5-15B-Thinker 构建。 Apriel-1.6 的文本和图像推理能力得到显著提升,与规模达其 10 倍的模型相比也能取得具有竞争力的性能。 与前代模型一样,它得益于在文本和图像领域广泛的持续预训练。 我们额外进行了侧重于监督微调(SFT)和强化学习(RL)的后训练。 Apriel-1.6 在不牺牲推理 token 效率的前提下实现了前沿性能。与 Apriel-1.5-15B-Thinker 相比,该模型在提升或保持任务性能的同时,将推理 token 使用量减少了 30% 以上。
亮点
<tool_calls>、</tool_calls>、[BEGIN FINAL RESPONSE]、<|end|>),以便更轻松地进行输出解析。更多详情请参见我们的 博客文章。
人工智能分析指数 v3.0 中包含的文本基准测试使用 Artificial Analysis 报告的分数。所有其他基准测试均为内部评估。
| 类别 | 基准测试 | Apriel-1.6-15B-Thinker | Apriel-1.5-15B-Thinker | GPT OSS 120B | DeepSeek R1 0528 | Gemini 2.5 Flash (Sep) | GPT 5 mini (high) | Claude 4.5 Sonnet (thinking) | o3-mini (high) |
|---|---|---|---|---|---|---|---|---|---|
| 平均得分** | 53.62 | 46.56 | 52.56 | 51.92 | 50.71 | 62.58 | 60.37 | 48.85 | |
| 函数调用 | BFCL v3 only | 63.50 | 51.88 | 50.62 | 39.75 | 39.75 | 17.62 | - | 50 |
| Tau2 bench Telecom | 69 | 57.8 | 66 | 37 | 32 | 68 | 50.8 | 31 | |
| Tau2 bench Retail | 66.67 | 46.78 | 61.4 | 59.94 | 61.69 | 73.39 | 69.8 | 75.73 | |
| Tau2 bench Airline | 58 | 52 | 45.3 | 47.33 | 56.66 | 59.33 | 58 | 61.33 | |
| ComplexFuncBench | 33.2 | 19 | 24.6 | 24.2 | 26.3 | 37.5 | 24.6 | 18.9 | |
| 指令遵循 | Agent IF | 57.2 | 55 | 54.20 | 52.20 | 49.70 | 57.60 | 54.50 | 54.90 |
| Multi IF | 83.34 | 76.91 | 82.95 | 73.76 | 82.49 | 85.37 | 84.32 | 87.28 | |
| Multi-Challenge | 46.15 | 41.39 | 46.90 | 44.50 | 49.08 | 57.90 | 42.49 | 38.46 | |
| IF Bench | 69 | 62 | 69 | 40 | 50 | 75 | 57 | 70.07 | |
| 数学 | AIME 25 | 88 | 88 | 93 | 76 | 73 | 91 | 88 | 86.67 |
| 编码 | Struct Eval | 79 | 48.50 | 71 | 73 | 70 | 69.92 | 76 | 73 |
| LCB | 81 | 73 | 88 | 77 | 70 | 84 | 71 | 73 | |
| SciCode | 37 | 35 | 39 | 40 | 41 | 39 | 45 | 40 | |
| 智能体能力 | DeepresearchBench | 36.47 | 32.73 | 36.30 | 34.19 | 38.15 | - | - | 33.40 |
| GAIA | 40 | 30.91 | 21.21 | 32.12 | 47.88 | 65.45 | 69.09 | 23.03 | |
| Work-Arena L1 | 59.1 | 51.5 | 50.9 | 63.9 | 51.8 | 65.5 | 62.7 | 52.4 | |
| OS World Small | 16.70 | 13.90 | 16.70 | 25 | 19.40 | 22.20 | 30.60 | 19.40 | |
| SWE Bench Verified | 23 | 16 | 31 | 29.60 | 34.20 | 61 | 64.2 | 22.60 | |
| Terminal Bench | 14 | 10 | 22 | 15 | 13 | 31 | 33 | 5.67 | |
| Aider Polyglot | 37.68 | 26.37 | 42 | 71.40 | 40 | 71.60 | 78 | 60.40 | |
| 知识 | MMLU Pro | 79 | 77 | 81 | 85 | 83 | 84 | 88 | 80 |
| 创意写作 | Creative writing v3 / EQ Bench | 59.73 | 60.24 | 53.70 | 79.40 | 74.25 | 75.25 | 80.70 | 30.40 |
| 其他 | GPQA Diamond | 73 | 71 | 78 | 81 | 79 | 83 | 83 | 77 |
| HLE | 10 | 12 | 18.5 | 14.9 | 11.1 | 19.7 | 17.3 | 12.3 | |
| 长文本理解 | AA LCR | 50* | 20 | 51 | 55 | 62 | 68 | 66 | 30*** |
* 此分数是在启用 DCA 的情况下获得的。若未启用,模型得分为 36。
** 平均得分计算涵盖所有基准测试,但不包括 BFCL v3 Only 和 DeepResearchBench,因为部分模型未提供这两项的分数。
*** o3-mini-high 的 AA LCR 分数是基于其 AA 指数得分的预测值。
| 基准测试 | Apriel-1.6-15B-Thinker | Apriel-1.5-15B-Thinker | GPT-5 (high) | GLM-4.5V (Thinking) | Gemini 2.5 Flash (high) | Claude Sonnet 3.7 (Thinking) | GPT-5 (Minimal) | Grok 4 Fast (Thinking) |
| MMMU (validation) | 72 | 70.22 | 81.33 | 74.33 | 70.66 | 73.66 | 66.66 | 70.11 |
| MMMU-PRO (10 choice) | 60.28 | 55.38 | 74.73 | 64.16 | 67.86 | 64.50 | 66.06 | 61.61 |
| MMMU-PRO (Vision Only) | 52.89 | 48.21 | 66.93 | 61.50 | 56.76 | 60.11 | 57.68 | 22.94 |
| LogicVista | 58.61 | 58.39 | 69.35 | 63.53 | 63.75 | 69.12 | 44.51 | 47.42 |
| MathVision | 60.85 | 50.99 | 67.10 | 59.53 | 59.21 | 50.32 | 35.52 | 48.35 |
| MathVista | 79.90 | 75.50 | 83.30 | 83.60 | 78.50 | 74.60 | 61.20 | 68.20 |
| MathVerse (Vision Dominant) | 66.75 | 58.38 | 79.82 | 68.65 | 70.68 | 56.09 | 39.84 | 54.69 |
| MathVerse (Text Dominant) | 79.06 | 76.40 | 84.64 | 77.41 | 78.80 | 69.28 | 43.78 | 72.20 |
| MMStar | 70.66 | 67.73 | 77.74 | 74.46 | 73.86 | 70 | 63.60 | 64.80 |
| CharXiv (descriptive) | 89.85 | 88.20 | 91.25 | 90.80 | 83.60 | 93.27 | 82.45 | 68.15 |
| CharXiv (reasoning) | 56.00 | 50.10 | 71.50 | 63.00 | 56.50 | 70.90 | 52.80 | 33.50 |
| AI2D Test | 86.04 | 82.87 | 90.05 | 87.75 | 82.09 | 84.19 | 85.16 | 81.86 |
| BLINK | 63.96 | 58.71 | 70.22 | 66.59 | 65.64 | 64.49 | 64.59 | 54.39 |
Apriel 系列模型适用于多种通用指令任务,包括:
不建议在无人工监督的安全关键型应用中使用,或用于需要确保事实准确性的场景。
pip install transformers以下是使用 transformers 库的 generate 函数演示模型用法的代码片段:
# Tested with transformers==4.48
import re
import requests
import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText
# Load model
model_id = "ServiceNow-AI/Apriel-1.6-15b-Thinker"
model = AutoModelForImageTextToText.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_id)
# Example 1: Text-only prompt
chat = [
{
"role": "user",
"content": [
{"type": "text", "text": "What is the capital for France?"},
],
}
]
inputs = processor.apply_chat_template(chat, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt")
inputs = {k: v.to(model.device) if isinstance(v, torch.Tensor) else v for k, v in inputs.items()}
inputs.pop("token_type_ids", None)
with torch.no_grad():
output_ids = model.generate(**inputs, max_new_tokens=1024, do_sample=True, temperature=0.6)
generated_ids = output_ids[:, inputs['input_ids'].shape[1]:]
output = processor.decode(generated_ids[0], skip_special_tokens=True)
response = re.findall(r"
$$BEGIN FINAL RESPONSE$$
(.*?)(?:<\|end\|>)", output, re.DOTALL)[0].strip()
print("Text-only Response:", response)
# Example 2: Image understanding
url = "https://picsum.photos/id/237/200/300"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
chat = [
{
"role": "user",
"content": [
{"type": "text", "text": "Which animal is this?"},
{"type": "image"},
],
}
]
prompt = processor.apply_chat_template(chat, add_generation_prompt=True, tokenize=False)
inputs = processor(text=prompt, images=[image], return_tensors="pt").to(model.device)
with torch.no_grad():
output_ids = model.generate(**inputs, max_new_tokens=1024, do_sample=True, temperature=0.6)
generated_ids = output_ids[:, inputs['input_ids'].shape[1]:]
output = processor.decode(generated_ids[0], skip_special_tokens=True)
response = re.findall(r"
$$BEGIN FINAL RESPONSE$$
(.*?)(?:<\|end\|>)", output, re.DOTALL)[0].strip()
print("Image Response:", response)
0.6。Here are my reasoning steps:\n开头。这已在默认对话模板中实现。<|begin_system|>
You are a thoughtful, systematic AI assistant from ServiceNow Language Models (SLAM) lab. Analyze each question carefully, present your reasoning step-by-step, then provide the final response after the marker [BEGIN FINAL RESPONSE].
<|begin_user|>
# user message here
<|begin_assistant|>
Here are my reasoning steps:
# thoughts here
[BEGIN FINAL RESPONSE]
# assistant response here
<|end|>模型将首先生成其思考过程,然后生成最终响应,最终响应以 [BEGIN FINAL RESPONSE] 开头。以下是展示聊天模板应用的代码片段:
from transformers import AutoTokenizer
model_name = "ServiceNow-AI/Apriel-1.6-15b-Thinker"
tokenizer = AutoTokenizer.from_pretrained(model_name)
# prepare the model input
custom_system_prompt = "Answer like a pirate."
prompt = "You are an expert assistant in the implementation of customer experience management aspect of retail applications \n \nYou will be using Python as the programming language. \n \nYou will utilize a factory design pattern for the implementation and following the dependency inversion principle \n \nYou will modify the implementation based on user requirements. \n \nUpon user request, you will add, update, and remove the features & enhancements in the implementation provided by you. \n \nYou will ask whether the user wants to refactor the provided code or needs a sample implementation for reference. Upon user confirmation, I will proceed accordingly. \n \n**Guidelines:** \n 1. **User Requirements:** \n - You have to ask users about their requirements, clarify the user expectations, and suggest the best possible solution by providing examples of Python code snippets. \n - Ask users about which type of reports they need to assess the AI model's performance, accuracy, and reliability. \n - After providing the solution, you have to ask the user about the trial of the solution and modify the solution based on the user feedback. \n \n 2. **Libraries/Frameworks:** \n - You will be utilizing Python as a programming language. \n - You will be using Flask framework for REST APIS implementation \n \n 3. **Communication Gesture:** \n - Your conversation with the user should be interactive, supportive, courageous, and professional. \n - You have to break down the complex concepts into sub-concepts and try to explain them to the user. \n - You have to ask the user for the required parameters. If the user refuses to provide in 2 attempts, politely exit the conversation. \n - You have to provide your supported parameters to the user, if the user refuses to accept them then you have to put an apology note and exit the conversation. \n - You have to track the conversation about unasked questions by the user. If some/one of the questions remain then you have to remind the user about these questions and proceed to answer them based on the user's confirmation \n \n 4. **Implementation:** \n - Your code/implementations should be reliable, scaleable, modular, and reusable. \n - You will be providing unit tests for the implementation upon user request. \n - You will be following MVC architecture for the applications \n - Your implementations must be well-commented and readable \n \n \n- Today's date is 23rd August 2024. \n- The default sender email is sender-assistant@email.com.\nHi, I am conducting research on retail customer feedback systems and I need assistance with designing and implementing them. Could you kindly provide me with a list of general customer feedback system modules?"
messages = [
{"role": "user", "content": custom_system_prompt + "\n\n" + prompt}
]
# example tools
tools = [{"type": "function", "function": {"name": "getRetailFeedbackModules", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"page": {"type": "integer", "description": "The current page number.", "default": 1}, "page_size": {"type": "integer", "description": "The number of items per page.", "default": 3}}}}}, {"type": "function", "function": {"name": "verifyImplementation", "description": "Returns the list of modules usually present in the retail industry", "parameters": {"type": "object", "properties": {"coding_language": {"type": "string", "description": "The supported languages for verification of implementation.", "default": "python", "enum": ["python", "java", "php"]}, "code": {"type": "string", "description": "The code which needs verification"}, "design_pattern": {"type": "string", "description": "The design pattern to verify in the implementation", "enum": ["factory", "strategy", "singleton"]}, "verify_best_practices": {"type": "boolean", "description": "The verification of the coding style based on the language selected", "default": true}}}}}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
tools=tools
)
model_inputs = tokenizer([text], return_tensors="pt")由于上游 PR 尚未合并,您可以使用此自定义镜像作为替代方式来运行模型,并启用工具和推理解析器。
docker.io/amant555/vllm_apriel:latestpython3 -m vllm.entrypoints.openai.api_server \
--model ServiceNow-AI/Apriel-1.6-15b-Thinker \
--served-model-name Apriel-1p6-15B-Thinker \
--trust_remote_code \
--max-model-len 131072 \
--enable-auto-tool-choice \
--tool-call-parser apriel \
--reasoning-parser aprielApriel-1.6-15b-Thinker 可在 Together AI 上进行托管推理,也可在 Ollama 上本地使用。
持续预训练: 涵盖数学、代码、科学、逻辑推理和多模态图文数据的数十亿tokens。
监督微调(SFT): 240万样本,涉及数学、代码、指令遵循、函数调用和对话,随后进行增量轻量级多模态SFT。
强化学习(RL): 多阶段RL,结合可验证奖励机制和 GSPO 算法,应用于文本和视觉任务。我们的RL阶段优化推理效率:通过减少不必要的中间步骤降低token消耗,在置信度高时提前停止推理,并对简单查询直接给出答案。
有关训练方法的更多详情,请参阅我们的 博客文章。
安全责任:
部署者和用户应强烈将其安全实践与既定框架和监管指南(如欧盟AI法案和NIST AI风险管理框架(RMF))保持一致。
免责声明:
用户对安全部署、管理和使用此开源LLM承担责任。本模型按"现状"提供,不就其安全性或对任何特定应用或环境的适用性作出明示或暗示的保证。
MIT
@misc{radhakrishna2025apriel1515bthinker,
title={Apriel-1.5-15b-Thinker},
author={Shruthan Radhakrishna and Aman Tiwari and Aanjaneya Shukla and Masoud Hashemi and Rishabh Maheshwary and Shiva Krishna Reddy Malay and Jash Mehta and Pulkit Pattnaik and Saloni Mittal and Khalil Slimi and Kelechi Ogueji and Akintunde Oladipo and Soham Parikh and Oluwanifemi Bamgbose and Toby Liang and Ahmed Masry and Khyati Mahajan and Sai Rajeswar Mudumba and Vikas Yadav and Sathwik Tejaswi Madhusudhan and Torsten Scholak and Sagar Davasam and Srinivas Sunkara and Nicholas Chapados},
year={2025},
eprint={2510.01141},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2510.01141},
}