试用 LFM • 文档 • LEAP • Discord

LFM2.5‑VL-450M

LFM2.5‑VL-450M 是 Liquid AI 基于首个视觉语言模型 LFM2-VL-450M 推出的更新版本。该模型构建于更新后的骨干模型 LFM2.5-350M 之上，并经过调优以实现更强的实际应用性能。有关 LFM2.5 系列模型的更多信息，请查看我们的博客文章。

增强的指令遵循能力，适用于视觉和语言任务。
改进的多语言视觉理解，支持阿拉伯语、中文、法语、德语、日语、韩语、葡萄牙语和西班牙语。
边界框预测与目标检测，实现基于视觉的实体理解。
函数调用支持，适用于纯文本输入。

🎥⚡️ 您可以通过我们的实时视频流字幕 WebGPU 演示，在浏览器中本地试用 LFM2.5-VL-450M 🎥⚡️

或者，您也可以在 Playground 上试用 API 模型。

📄 模型详情

LFM2.5-VL-450M 是一款通用型视觉语言模型，具备以下特性：

语言模型骨干：LFM2.5-350M
视觉编码器：SigLIP2 NaFlex 形状优化版 86M
上下文长度：32,768 个 token
词汇量：65,536
支持语言：英语、阿拉伯语、中文、法语、德语、日语、韩语、葡萄牙语和西班牙语
原生分辨率处理：无需放大即可处理高达 512*512 像素的图像，并能无失真地保留非标准宽高比
分块策略：将大型图像分割为不重叠的 512×512 补丁，并包含缩略图编码以获取全局上下文
推理时灵活性：用户可调整最大图像 token 数和分块数量，以在无需重新训练的情况下实现速度与质量的平衡
生成参数：
- 文本：temperature=0.1，min_p=0.15，repetition_penalty=1.05
- 视觉：min_image_tokens=32，max_image_tokens=256，do_image_splitting=True

模型	描述
LFM2.5-VL-450M	原生格式的原始模型 checkpoint。最适用于使用 Transformers 和 vLLM 进行微调或推理。
LFM2.5-VL-450M-GGUF	适用于 llama.cpp 及兼容工具的量化格式。针对 CPU 推理和本地部署进行优化，内存占用更低。
LFM2.5-VL-450M-ONNX	用于跨平台部署的 ONNX Runtime 格式。支持在多种环境（云、边缘、移动设备）中实现硬件加速推理。
LFM2.5-VL-450M-MLX-8bit	适用于 Apple Silicon 的 MLX 格式。通过 mlx-vlm 优化，可在 Mac 上实现快速设备端推理。同时提供 4bit、5bit、6bit 和 bf16 版本。

我们建议将其用于一般视觉语言任务、图像 captioning 和目标检测。该模型不太适用于知识密集型任务或精细的 OCR 场景。

聊天模板

LFM2.5-VL采用类ChatML格式。详情请参见聊天模板文档。

<|startoftext|><|im_start|>system
You are a helpful multimodal assistant by Liquid AI.<|im_end|>
<|im_start|>user
<image>Describe this image.<|im_end|>
<|im_start|>assistant
This image shows a Caenorhabditis elegans (C. elegans) nematode.<|im_end|>

您可以使用 processor.apply_chat_template() 自动格式化您的消息。

🏃 推理

您可以使用 Hugging Face transformers v5.1 或更新版本运行 LFM2.5-VL-450M：

pip install transformers pillow

from transformers import AutoProcessor, AutoModelForImageTextToText
from transformers.image_utils import load_image

# Load model and processor
model_id = "LiquidAI/LFM2.5-VL-450M"
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16"
)
processor = AutoProcessor.from_pretrained(model_id)

# Load image and create conversation
url = "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
image = load_image(url)
conversation = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "What is in this image?"},
        ],
    },
]

# Generate Answer
inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    tokenize=True,
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=64)
processor.batch_decode(outputs, skip_special_tokens=True)[0]

# This image captures the iconic Statue of Liberty standing majestically on Liberty Island in New York City. The statue, a symbol of freedom and democracy, is prominently featured in the foreground, its greenish-gray hue contrasting beautifully with the surrounding water.

视觉定位

LFM2.5-VL-450M 支持边界框预测：

url = "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
image = load_image(url)
query = "status"
prompt = f'Detect all instances of: {query}. Response must be a JSON array: [{"label": ..., "bbox": [x1, y1, x2, y2]}, ...]. Coordinates are normalized to [0,1].'

conversation = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": prompt},
        ],
    },
]

# Generate Answer
inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    tokenize=True,
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=64)
processor.batch_decode(outputs, skip_special_tokens=True)[0]

# [{"label": "statue", "bbox": [0.3, 0.25, 0.4, 0.65]}]

工具使用

LFM2.5 通过将聊天模板与分词器结合，支持纯文本输入的函数调用。完整指南请参见工具使用文档。

tools = [{
    "name": "get_weather",
    "description": "Get current weather for a location",
    "parameters": {
        "type": "object",
        "properties": {"location": {"type": "string"}},
        "required": ["location"]
    }
}]

messages = [{"role": "user", "content": "What's the weather in Paris?"}]

# Apply chat template with tools
inputs = processor.tokenizer.apply_chat_template(
    messages,
    tools=tools,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
)
input_ids = inputs["input_ids"].to(model.device)
outputs = model.generate(input_ids, max_new_tokens=256)
response = processor.tokenizer.decode(outputs[0, input_ids.shape[1]:], skip_special_tokens=False)

# <|tool_call_start|>[get_weather(location="Paris")]<|tool_call_end|>I am retrieving the current weather for Paris.<|im_end|>

名称	描述	文档
Transformers	可直接访问模型内部结构的简单推理。	链接
vLLM	基于 GPU 的高吞吐量生产部署。	链接
SGLang	基于 GPU 的高吞吐量生产部署。	链接
llama.cpp	支持 CPU 卸载的跨平台推理。	链接

🔧 微调

我们建议在您的使用场景下对 LFM2.5-VL-450M 模型进行微调，以最大限度地发挥其性能。

笔记本	描述	链接
SFT (Unsloth)	使用 Unsloth 进行带 LoRA 的监督微调。
SFT (TRL)	使用 TRL 进行带 LoRA 的监督微调。

📊 性能表现

LFM2.5-VL-450M 在视觉和语言基准测试中均优于 LFM2-VL-450M，同时新增了两项功能：RefCOCO-M 上的边界框预测以及 BFCLv4 衡量的函数调用支持。

视觉基准测试

模型	MMStar	RealWorldQA	MMBench（开发版英文）	MMMU（验证集）	POPE	MMVet	BLINK	InfoVQA（验证集）	OCRBench	MM-IFEval	MMMB	CountBench	RefCOCO-M
LFM2.5-VL-450M	43.00	58.43	60.91	32.67	86.93	41.10	43.92	43.02	684	45.00	68.09	73.31	81.28
LFM2-VL-450M	40.87	52.03	56.27	34.44	83.79	33.85	42.61	44.56	657	33.09	54.29	47.64	-
SmolVLM2-500M	38.20	49.90	52.32	34.10	82.67	29.90	40.70	24.64	609	11.27	46.79	61.81	-

所有视觉基准测试分数均使用 VLMEvalKit 获取。多语言分数基于 GPT-4.1-mini 将英文基准测试翻译成阿拉伯语、中文、法语、德语、日语、韩语、葡萄牙语和西班牙语后的平均值。

语言基准测试

模型	GPQA	MMLU Pro	IFEval	Multi-IF	BFCLv4
LFM2.5-VL-450M	25.66	19.32	61.16	34.63	21.08
LFM2-VL-450M	23.13	17.22	51.75	26.21	-
SmolVLM2-500M	23.84	13.57	30.14	6.82	-

📬 联系方式

有疑问或想交流？加入我们的 Discord 社区
如对边缘部署的定制解决方案感兴趣，请联系我们的销售团队。

引用格式

@article{liquidai2025lfm2,
 title={LFM2 Technical Report},
 author={Liquid AI},
 journal={arXiv preprint arXiv:2511.23404},
 year={2025}
}