Granite-4.1-3B

模型概述： Granite-4.1-3B 是一个拥有 30 亿参数的长上下文指令模型，它基于 Granite-4.1-3B-Base 进行微调，所使用的数据集包括具有宽松许可证的开源指令数据集以及内部收集的合成数据集。Granite 4.1 系列模型采用了经过改进的训练后处理流程，其中包括有监督微调与强化学习对齐，从而在工具调用、指令遵循和对话能力方面得到了增强。

开发者： Granite 团队，IBM
HF 集合： Granite 4.1 Language Models HF Collection
技术博客： Granite-4.1 Blog
GitHub 仓库： ibm-granite/granite-4.1-language-models
网站： Granite Docs
发布日期： 2026 年 4 月 29 日
许可证： Apache 2.0

支持语言： 英语、德语、西班牙语、法语、日语、葡萄牙语、阿拉伯语、捷克语、意大利语、韩语、荷兰语和中文。用户可以对 Granite 4.1 模型进行微调，以支持更多语言。

预期用途： 该模型旨在遵循通用指令，并可作为跨不同领域（包括业务应用）AI 助手的基础，也可用于具备工具使用能力的 LLM 智能体。

功能

文本摘要
文本分类
文本提取
问答
检索增强生成（RAG）
代码相关任务
函数调用任务
多语言对话场景
中间填充（FIM）代码补全

生成： 以下是使用 Granite-4.1-3B 模型的简单示例。

安装以下库：

pip install torch torchvision torchaudio
pip install accelerate
pip install transformers

然后，从与您的使用场景相关的部分复制代码片段。

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"
model_path = "ibm-granite/granite-4.1-3b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
# change input text as desired
chat = [
    { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
# generate output tokens
output = model.generate(**input_tokens, 
                        max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# print output
print(output[0])

预期输出：

<|start_of_role|>user<|end_of_role|>Please list one IBM Research laboratory located in the United States. You should only output its name and location.<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>Almaden Research Center, San Jose, California<|end_of_text|>

工具调用： Granite-4.1-3B 具备增强的工具调用能力，能够无缝集成外部函数和 API。如需定义工具列表，请遵循 OpenAI 的函数定义模式。

以下是如何使用 Granite-4.1-3B 模型工具调用能力的示例：

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"
model_path = "ibm-granite/granite-4.1-3b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather for a specified city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "Name of the city"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

# change input text as desired
chat = [
    { "role": "user", "content": "What's the weather like in Boston right now?" },
]
chat = tokenizer.apply_chat_template(chat, \
                                     tokenize=False, \
                                     tools=tools, \
                                     add_generation_prompt=True)
# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
# generate output tokens
output = model.generate(**input_tokens, 
                        max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# print output
print(output[0])

预期输出：

<|start_of_role|>system<|end_of_role|>You are a helpful assistant with access to the following tools. You may call one or more tools to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"type": "function", "function": {"name": "get_current_weather", "description": "Get the current weather for a specified city.", "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "Name of the city"}}, "required": ["city"]}}}
</tools>

For each tool call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>What's the weather like in Boston right now?<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|><tool_call>
{"name": "get_current_weather", "arguments": {"city": "Boston"}}
</tool_call><|end_of_text|>

评估结果：

基准测试	指标	30亿参数密集型	80亿参数密集型	300亿参数密集型
通用任务
MMLU	5-shot	67.02	73.84	80.16
MMLU-Pro	5-shot, CoT	49.83	55.99	64.09
BBH	3-shot, CoT	75.83	80.51	83.74
AGI EVAL	0-shot, CoT	65.16	72.43	77.80
GPQA	0-shot, CoT	31.70	41.96	45.76
SimpleQA		3.68	4.82	6.81
对齐任务
AlpacaEval 2.0		38.57	50.08	56.16
IFEval 平均值		82.30	87.06	89.65
ArenaHard		37.80	68.98	71.02
MTBench 平均值		7.57	8.61	8.61
数学任务
GSM8K	8-shot	86.88	92.49	94.16
GSM Symbolic	8-shot	81.32	83.70	75.70
Minerva Math	0-shot, CoT	67.94	80.10	81.32
DeepMind Math	0-shot, CoT	64.64	80.07	81.93
代码任务
HumanEval	pass@1	81.71	85.37	88.41
HumanEval+	pass@1	76.83	79.88	85.37
MBPP	pass@1	71.16	87.30	85.45
MBPP+	pass@1	62.17	73.81	73.54
CRUXEval-O	pass@1	40.75	47.63	55.75
BigCodeBench	pass@1	32.19	35.00	38.77
MULTIPLE	pass@1	52.54	60.26	62.31
Eval+ 平均值	pass@1	67.05	80.21	82.66
工具调用任务
BFCL v3		60.80	68.27	73.68
多语言任务
MMMLU	5-shot	57.61	64.84	73.71
INCLUDE	5-shot	52.05	58.89	67.26
MGSM	8-shot	70.00	82.32	71.12
安全性
SALAD-Bench		93.95	95.80	96.41
AttaQ		81.88	81.19	85.76
Tulu3 安全性评估平均值		66.84	75.57	78.19

**多语言基准测试及其包含的语言：**
基准测试	语言数量	语言
MMMLU	11	ar, de, en, es, fr, ja, ko, pt, zh, bn, hi
INCLUDE	14	hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh
MGSM	5	en, es, fr, ja, zh

模型架构：

Granite-4.1-3B 基准模型基于纯解码器密集型Transformer架构构建。该架构的核心组件包括：GQA、RoPE、SwiGLU激活函数的MLP、RMSNorm以及共享的输入/输出嵌入。

模型	30亿参数密集型	80亿参数密集型	300亿参数密集型
嵌入维度	2560	4096	4096
层数	40	40	64
注意力头维度	64	128	128
注意力头数量	40	32	32
KV头数量	8	8	8

MLP / 共享专家隐藏层大小 8192 12800 32768 MLP 激活函数 SwiGLU SwiGLU SwiGLU 序列长度 131072 131072 131072 位置嵌入 RoPE RoPE RoPE 参数量 3B 8B 30B

训练数据： 总体而言，我们的监督微调（SFT）数据主要包含三个关键来源：（1）具有宽松许可的公开可用数据集，（2）针对特定能力的内部合成数据，以及（3）精选的人工整理数据。

监督微调与强化学习： 指令模型通过显著改进的监督微调（SFT） pipeline 和强化学习（RL） pipeline 进行了微调，使用了上述高质量的各类数据集组合。通过严格的 SFT-RL 循环，我们提升了 Granite-4.1 模型的工具调用、指令遵循和对话能力。有关更多详细信息，请查看我们的 Granite-4.1 博客。

基础设施： 我们在 CoreWeave 托管的 NVIDIA GB200 NVL72 集群上训练了 Granite 4.1 语言模型。机架内通信通过 72-GPU NVLink 域进行，而非阻塞、全胖树 NDR 400 Gb/s InfiniBand 网络提供机架间通信。该集群为我们在数千个 GPU 上训练模型提供了可扩展且高效的基础设施。

伦理考量与局限性： Granite 4.1 指令模型主要使用以英语为主的指令-响应对进行微调，但也包含覆盖多种语言的多语言数据。尽管该模型能够处理多语言对话用例，但其性能可能与英语任务存在差异。在这种情况下，引入少量示例（少样本）可以帮助模型生成更准确的输出。虽然在模型对齐过程中已充分考虑安全性，但在某些情况下，模型仍可能对用户提示产生不准确、有偏见或不安全的响应。我们强烈建议社区在使用此模型时，针对其特定任务进行适当的安全测试和调整。为增强企业部署中的安全性，我们建议将 Granite 4.1 语言模型与 Granite Guardian 配合使用，这是一个旨在检测和标记输入和输出中与 IBM AI 风险图谱中概述的关键维度相关风险的模型。

资源

⭐️ 了解 Granite 的最新更新：https://www.ibm.com/granite
📄 获取教程、最佳实践和提示工程建议：https://www.ibm.com/granite/docs/
💡 了解最新的 Granite 学习资源：https://ibm.biz/granite-learning-resources

Granite-4.1-3B

开发者： Granite 团队，IBM
HF 集合： Granite 4.1 Language Models HF Collection
技术博客： Granite-4.1 Blog
GitHub 仓库： ibm-granite/granite-4.1-language-models
网站： Granite Docs
发布日期： 2026 年 4 月 29 日
许可证： Apache 2.0

预期用途： 该模型旨在遵循通用指令，并可作为跨不同领域（包括业务应用）AI 助手的基础，也可用于具备工具使用能力的 LLM 智能体。

功能

文本摘要
文本分类
文本提取
问答
检索增强生成（RAG）
代码相关任务
函数调用任务
多语言对话场景
中间填充（FIM）代码补全

生成： 以下是使用 Granite-4.1-3B 模型的简单示例。

安装以下库：

pip install torch torchvision torchaudio
pip install accelerate
pip install transformers

然后，从与您的使用场景相关的部分复制代码片段。

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"
model_path = "ibm-granite/granite-4.1-3b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
# change input text as desired
chat = [
    { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
# generate output tokens
output = model.generate(**input_tokens, 
                        max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# print output
print(output[0])

预期输出：

<|start_of_role|>user<|end_of_role|>Please list one IBM Research laboratory located in the United States. You should only output its name and location.<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>Almaden Research Center, San Jose, California<|end_of_text|>

工具调用： Granite-4.1-3B 具备增强的工具调用能力，能够无缝集成外部函数和 API。如需定义工具列表，请遵循 OpenAI 的函数定义模式。

以下是如何使用 Granite-4.1-3B 模型工具调用能力的示例：

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"
model_path = "ibm-granite/granite-4.1-3b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather for a specified city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "Name of the city"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

# change input text as desired
chat = [
    { "role": "user", "content": "What's the weather like in Boston right now?" },
]
chat = tokenizer.apply_chat_template(chat, \
                                     tokenize=False, \
                                     tools=tools, \
                                     add_generation_prompt=True)
# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
# generate output tokens
output = model.generate(**input_tokens, 
                        max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# print output
print(output[0])

预期输出：

<|start_of_role|>system<|end_of_role|>You are a helpful assistant with access to the following tools. You may call one or more tools to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"type": "function", "function": {"name": "get_current_weather", "description": "Get the current weather for a specified city.", "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "Name of the city"}}, "required": ["city"]}}}
</tools>

For each tool call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>What's the weather like in Boston right now?<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|><tool_call>
{"name": "get_current_weather", "arguments": {"city": "Boston"}}
</tool_call><|end_of_text|>

评估结果：

基准测试	指标	30亿参数密集型	80亿参数密集型	300亿参数密集型
通用任务
MMLU	5-shot	67.02	73.84	80.16
MMLU-Pro	5-shot, CoT	49.83	55.99	64.09
BBH	3-shot, CoT	75.83	80.51	83.74
AGI EVAL	0-shot, CoT	65.16	72.43	77.80
GPQA	0-shot, CoT	31.70	41.96	45.76
SimpleQA		3.68	4.82	6.81
对齐任务
AlpacaEval 2.0		38.57	50.08	56.16
IFEval 平均值		82.30	87.06	89.65
ArenaHard		37.80	68.98	71.02
MTBench 平均值		7.57	8.61	8.61
数学任务
GSM8K	8-shot	86.88	92.49	94.16
GSM Symbolic	8-shot	81.32	83.70	75.70
Minerva Math	0-shot, CoT	67.94	80.10	81.32
DeepMind Math	0-shot, CoT	64.64	80.07	81.93
代码任务
HumanEval	pass@1	81.71	85.37	88.41
HumanEval+	pass@1	76.83	79.88	85.37
MBPP	pass@1	71.16	87.30	85.45
MBPP+	pass@1	62.17	73.81	73.54
CRUXEval-O	pass@1	40.75	47.63	55.75
BigCodeBench	pass@1	32.19	35.00	38.77
MULTIPLE	pass@1	52.54	60.26	62.31
Eval+ 平均值	pass@1	67.05	80.21	82.66
工具调用任务
BFCL v3		60.80	68.27	73.68
多语言任务
MMMLU	5-shot	57.61	64.84	73.71
INCLUDE	5-shot	52.05	58.89	67.26
MGSM	8-shot	70.00	82.32	71.12
安全性
SALAD-Bench		93.95	95.80	96.41
AttaQ		81.88	81.19	85.76
Tulu3 安全性评估平均值		66.84	75.57	78.19

**多语言基准测试及其包含的语言：**
基准测试	语言数量	语言
MMMLU	11	ar, de, en, es, fr, ja, ko, pt, zh, bn, hi
INCLUDE	14	hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh
MGSM	5	en, es, fr, ja, zh

模型架构：

Granite-4.1-3B 基准模型基于纯解码器密集型Transformer架构构建。该架构的核心组件包括：GQA、RoPE、SwiGLU激活函数的MLP、RMSNorm以及共享的输入/输出嵌入。

模型	30亿参数密集型	80亿参数密集型	300亿参数密集型
嵌入维度	2560	4096	4096
层数	40	40	64
注意力头维度	64	128	128
注意力头数量	40	32	32
KV头数量	8	8	8

MLP / 共享专家隐藏层大小 8192 12800 32768 MLP 激活函数 SwiGLU SwiGLU SwiGLU 序列长度 131072 131072 131072 位置嵌入 RoPE RoPE RoPE 参数量 3B 8B 30B

资源

⭐️ 了解 Granite 的最新更新：https://www.ibm.com/granite
📄 获取教程、最佳实践和提示工程建议：https://www.ibm.com/granite/docs/
💡 了解最新的 Granite 学习资源：https://ibm.biz/granite-learning-resources