试用 LFM • 文档 • LEAP • Discord

LFM2.5-8B-A1B

[!IMPORTANT] ⚠️重要提示： 为修复 llama.cpp 中的工具调用问题，原始版本发布后对分词器进行了更新。如果您是在 commit feb5e04 之前下载的 LFM2.5-8B-A1B，请重新下载分词器文件。 GGUF 文件也已使用更新后的分词器重新转换。

LFM2.5 是一系列全新的混合模型，专为设备端部署而设计。它在 LFM2 架构的基础上，进行了扩展预训练和强化学习。

设备端个人助理：旨在支持实际应用场景，能够串联工具调用，并在各类设备上执行复杂指令。
压缩性能：在指令遵循和智能体任务方面，性能可与更大规模的密集型模型及混合专家模型（MoE）相媲美。
卓越吞吐量：在同尺寸模型中，CPU 和 GPU 推理速度均为最快，并且从发布之初就支持 llama.cpp、MLX、vLLM 和 SGLang。

有关 LFM2.5-8B-A1B 的更多信息，请参阅我们的博客文章。

AA-Omniscience Index（分值越高越好）对正确答案给予奖励，对幻觉内容进行惩罚。分值范围为 -100 至 100。更多结果请参见 Artificial Analysis。

🗒️ 模型详情

模型	参数规模	描述
LFM2.5-8B-A1B-Base	总参数 8.3B / 激活参数 1.5B	用于微调的预训练基础模型
LFM2.5-8B-A1B	总参数 8.3B / 激活参数 1.5B	经推理调优的通用模型

LFM2.5-8B-A1B 是一款纯文本通用模型，具有以下特性：

总参数规模：8.3B
激活参数规模：1.5B
层数：24 层（18 层双门控 LIV 卷积层 + 6 层 GQA 层）
训练数据量：38 万亿 tokens
上下文长度：128,000
词汇表大小：128,000
支持语言：英语、阿拉伯语、中文、法语、德语、意大利语、日语、韩语、葡萄牙语、西班牙语
生成参数：建议使用以下参数：
- temperature: 0.2
- top_k: 80
- repetition_penalty: 1.05

模型	描述
LFM2.5-8B-A1B	原生格式的原始模型 checkpoint。最适用于使用 Transformers、vLLM 和 SGLang 进行微调或推理。
LFM2.5-8B-A1B-GGUF	适用于 llama.cpp 及兼容工具的量化格式。针对边缘推理和本地部署进行了优化。
LFM2.5-8B-A1B-ONNX	用于跨平台部署的 ONNX Runtime 格式。
LFM2.5-8B-A1B-MLX	适用于 Apple Silicon 的 MLX 格式。针对 Mac 设备上的快速推理进行了优化。

我们建议将 LFM2.5-8B-A1B 用于智能体工作流、工具调用、结构化输出、多语言助手以及设备端个人助理应用。若没有检索增强，它并非最适合复杂编程任务或知识密集型问答场景。

对话模板

LFM2.5 采用类 ChatML 格式。详情请参见对话模板文档。示例：

<|startoftext|><|im_start|>system
You are a helpful assistant trained by Liquid AI.<|im_end|>
<|im_start|>user
What is C. elegans?<|im_end|>
<|im_start|>assistant

由于LFM2.5-8B-A1B是一个推理模型，助手回复在最终答案前包含明确的思维链。您可以使用tokenizer.apply_chat_template()自动格式化消息。

工具使用

LFM2.5支持通过四个步骤进行函数调用：

函数定义：在系统提示中以JSON对象形式提供工具列表，或使用带有tools=...参数的tokenizer.apply_chat_template()。
函数调用：默认情况下，LFM2.5会编写类Python风格的函数调用（位于<|tool_call_start|>和<|tool_call_end|>特殊标记之间的Python列表）作为助手回答。您可以在系统提示中要求模型输出JSON格式的函数调用来覆盖此行为。
函数执行：执行调用并以tool角色返回结果。
最终答案：LFM2.5解释工具输出并返回一个针对原始提示的纯文本答案。

完整指南请参见工具使用文档。示例：

<|startoftext|><|im_start|>system
List of tools: [{"name": "get_candidate_status", "description": "Retrieves the current status of a candidate in the recruitment process", "parameters": {"type": "object", "properties": {"candidate_id": {"type": "string", "description": "Unique identifier for the candidate"}}, "required": ["candidate_id"]}}]<|im_end|>
<|im_start|>user
What is the current status of candidate ID 12345?<|im_end|>
<|im_start|>assistant
<|tool_call_start|>[get_candidate_status(candidate_id="12345")]<|tool_call_end|>Checking the current status of candidate ID 12345.<|im_end|>
<|im_start|>tool
[{"candidate_id": "12345", "status": "Interview Scheduled", "position": "Clinical Research Associate", "date": "2023-11-20"}]<|im_end|>
<|im_start|>assistant
The candidate with ID 12345 is currently in the "Interview Scheduled" stage for the position of Clinical Research Associate, with an interview date set for 2023-11-20.<|im_end|>

🏃 推理

LFM2.5-8B-A1B 支持多种推理框架。完整列表请参见推理文档。

名称	描述	文档	笔记本
Transformers	可直接访问模型内部结构的简单推理。	链接
vLLM	适用于 GPU 的高吞吐量生产部署。	链接
llama.cpp	支持 CPU 卸载的跨平台推理。	链接
MLX	Apple 的机器学习框架，针对 Apple Silicon 优化。	链接	—
LM Studio	用于在本地运行 LLM 的桌面应用程序。	链接	—

使用 Transformers 快速开始（兼容 transformers>=5.0.0）：

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_id = "LiquidAI/LFM2.5-8B-A1B"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",
#   attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "What is C. elegans?"

input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
)["input_ids"].to(model.device)

output = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.2,
    top_k=80,
    repetition_penalty=1.05,
    max_new_tokens=8192,
    streamer=streamer,
)

🔧 微调

为获得最佳效果，建议针对您的特定使用场景对LFM2.5进行微调。

名称	描述	文档
CPT (Unsloth)	使用Unsloth进行文本补全的持续预训练。	链接
CPT (Unsloth)	使用Unsloth进行翻译的持续预训练。	链接
SFT (Unsloth)	使用Unsloth进行带LoRA的监督微调。	链接
SFT (TRL)	使用TRL进行带LoRA的监督微调。	链接
DPO (TRL)	使用TRL进行带LoRA的直接偏好优化。	链接
GRPO (Unsloth)	使用Unsloth进行带LoRA的GRPO。	链接
GRPO (TRL)	使用TRL进行带LoRA的GRPO。	链接

📊 性能表现

相较于 LFM2-8B-A1B 的提升

借助推理能力优化、规模化预训练以及大规模强化学习（RL），LFM2.5-8B-A1B 在各个方面均超越了其前代模型：

基准测试	LFM2-8B-A1B	LFM2.5-8B-A1B	变化值
AA-Omniscience Index	-78.42	-24.70	+53.62
AA-Omniscience Accuracy	7.33	8.67	+1.34
AA-Omniscience Non-Hallucination Rate	7.46	63.47	+56.01
IFEval	79.44	91.84	+12.40
IFBench	26.00	56.47	+30.47
Multi-IF	58.54	79.93	+21.39
MATH500	74.80	88.76	+13.96
AIME25	20.00	42.53	+22.53
BFCLv3	45.07	64.36	+19.29
BFCLv4	25.52	48.50	+22.98
Tau² Telecom	13.60	88.07	+74.47
Tau² Retail	7.02	39.82	+32.80

知识与指令遵循能力

模型	参数规模	AA-Omni. Index	AA-Omni. Accuracy	AA-Omni. Non-Halluc.	IFEval	IFBench	Multi-IF
LFM2.5-8B-A1B	8B/A1B	-24.70	8.67	63.47	91.84	56.47	79.93
Granite-4.0-H-Tiny	7B/A1B	-75.50	9.37	6.38	82.23	21.28	59.00
Qwen3.5-4B	4B	-51.53	17.20	16.99	87.80	50.38	67.43
Qwen3-30B-A3B-Thinking-2507	30.5B/3.3B	-51.31	18.80	13.87	90.82	51.11	79.04
Gemma-4-E2B-IT	5.1B	-72	7.00	15.05	82.93	33.53	69.70
Gemma-4-E4B-IT	8B	-50.67	8.10	36.06	87.74	39.48	77.58
Gemma-4-26B-A4B-IT	26B/4B	-62.07	14.37	10.75	91.40	47.25	82.06
gpt-oss-20b	21B/3.6B	-49.17	14.57	24.50	86.73	58.65	76.64

数学与智能体工作流能力

模型	参数规模	MATH500	AIME25	AIME26	BFCLv3	BFCLv4	Tau² Telecom	Tau² Retail
LFM2.5-8B-A1B	8B/A1B	88.76	42.53	50.00	64.79	49.73	88.07	39.82
Granite-4.0-H-Tiny	7B/A1B	59.20	4.93	3.33	56.89	28.52	16.67	18.42
Qwen3.5-4B	4B	80.76	54.28	58.33	71.06	54.01	87.72	71.93
Qwen3-30B-A3B-Thinking-2507	30.5B/3.3B	86.48	71.67	66.67	73.39	50.53	21.93	56.14
Gemma-4-E2B-IT	5.1B	64.00	26	30	56.44	31.91	22.37	18.95
Gemma-4-E4B-IT	8B	65.00	34.33	40.67	57.31	33.92	26.75	42.11

CPU 推理

GPU 推理

LFM2.5-8B-A1B 是同尺寸模型中速度最快的，在高并发情况下可达到每秒 18.5K 输出 tokens，单张 H100 每天可处理超过 16 亿 tokens。

📬 联系方式

有疑问或想交流？加入我们的 Discord 社区。
如对边缘部署的定制解决方案感兴趣，请联系我们的销售团队。

引用

@article{liquidAI20268BA1B,
  author  = {Liquid AI},
  title   = {LFM2.5-8B-A1B: Personal Assistant On Your Laptop},
  journal = {Liquid AI Blog},
  year    = {2026},
  note    = {www.liquid.ai/blog/lfm2-5-8b-a1b},
}

@article{liquidai2025lfm2,
  title   = {LFM2 Technical Report},
  author  = {Liquid AI},
  journal = {arXiv preprint arXiv:2511.23404},
  year    = {2025}
}