granite-docling-258M-mlx

Granite Docling 是一款多模态图文转文本模型，专为高效文档转换而设计。它保留了 Docling 的核心功能，同时与 DoclingDocuments 保持无缝集成，确保完全兼容。

本模型使用 mlx-vlm 版本 0.3.3 从 ibm-granite/granite-docling-258M 转换为 MLX 格式。有关模型的更多详细信息，请参考原始模型卡片。

💡 此 MLX 模型经过优化，可在 Apple Silicon Mac 上高效运行。

如何通过 Docling 使用此模型

如果您通过 🐥Docling 运行，它将自动选择 Granite-Docling 模型的 MLX 版本。您可以通过以下 CLI 选项进行选择：

# Convert to HTML and Markdown:
docling --to html --to md --pipeline vlm --vlm-model granite_docling "https://arxiv.org/pdf/2501.17887" # accepts files, urls or directories

# Convert to HTML including layout visualization:
docling --to html_split_page --show-layout --pipeline vlm --vlm-model granite_docling "https://arxiv.org/pdf/2501.17887"

GraniteDocling result in split page view

如何通过原生 mlx-vlm 使用此模型

您也可以运行纯 mlx-vlm 来生成预测。

要通过 mlx-vlm 命令行界面运行，请使用以下命令：

pip install mlx_vlm 
python -m mlx_vlm.generate --model ibm-granite/granite-docling-258M-mlx --max-tokens 4096 --temperature 0.0 --prompt "Convert this page to docling." --image <path_to_image>

若要使用 mlx-vlm Python SDK 运行，将输出解析为 DoclingDocument 并导出为多种格式（例如 Markdown、HTML），请参考以下代码。

# /// script
# requires-python = ">=3.12"
# dependencies = [
#     "docling-core",
#     "mlx-vlm", 
#     "pillow",
#     "transformers",
# ]
# ///

import webbrowser
from pathlib import Path

from docling_core.types.doc import ImageRefMode
from docling_core.types.doc.document import DocTagsDocument, DoclingDocument
from mlx_vlm import load, stream_generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
from transformers.image_utils import load_image

# Configuration
MODEL_PATH = "ibm-granite/granite-docling-258M-mlx"
PROMPT = "Convert this page to docling."
SHOW_IN_BROWSER = True

# Sample images (pick one...)
# SAMPLE_IMAGE = "https://huggingface.co/ibm-granite/granite-docling-258M/resolve/main/assets/new_arxiv.png"
# SAMPLE_IMAGE = "https://ibm.biz/docling-page-with-list"
SAMPLE_IMAGE = "https://ibm.biz/docling-page-with-table"

# Load model and processor
print("Loading model...")
model, processor = load(MODEL_PATH)
config = load_config(MODEL_PATH)

# Prepare input image and prompt
print("Preparing input...")
pil_image = load_image(SAMPLE_IMAGE)
formatted_prompt = apply_chat_template(processor, config, PROMPT, num_images=1)

# Generate DocTags output
print("Generating DocTags...\n")
output = ""
for token in stream_generate(
    model, processor, formatted_prompt, [pil_image], max_tokens=4096, verbose=False
):
    output += token.text
    print(token.text, end="")
    if "</doctag>" in token.text:
        break

print("\n\nProcessing output...")

# Create DoclingDocument from generated DocTags
doctags_doc = DocTagsDocument.from_doctags_and_image_pairs([output], [pil_image])
doc = DoclingDocument.load_from_doctags(doctags_doc, document_name="Sample Document")

# Export to different formats
print("\nMarkdown output:\n")
print(doc.export_to_markdown())

# Save as HTML with embedded images
output_path = Path("./output.html") 
doc.save_as_html(output_path, image_mode=ImageRefMode.EMBEDDED)
print(f"\nHTML saved to: {output_path}")

# Open in browser
if SHOW_IN_BROWSER:
    webbrowser.open(f"file:///{str(output_path.resolve())}")