Granite Docling 是一款多模态图文转文本模型,专为高效文档转换而设计。它保留了 Docling 的核心功能,同时与 DoclingDocuments 保持无缝集成,确保完全兼容。
本模型使用 mlx-vlm 版本 0.3.3 从 ibm-granite/granite-docling-258M 转换为 MLX 格式。
有关模型的更多详细信息,请参考 原始模型卡片。
💡 此 MLX 模型经过优化,可在 Apple Silicon Mac 上高效运行。
如果您通过 🐥Docling 运行,它将自动选择 Granite-Docling 模型的 MLX 版本。 您可以通过以下 CLI 选项进行选择:
# Convert to HTML and Markdown:
docling --to html --to md --pipeline vlm --vlm-model granite_docling "https://arxiv.org/pdf/2501.17887" # accepts files, urls or directories
# Convert to HTML including layout visualization:
docling --to html_split_page --show-layout --pipeline vlm --vlm-model granite_docling "https://arxiv.org/pdf/2501.17887"
您也可以运行纯 mlx-vlm 来生成预测。
要通过 mlx-vlm 命令行界面运行,请使用以下命令:
pip install mlx_vlm
python -m mlx_vlm.generate --model ibm-granite/granite-docling-258M-mlx --max-tokens 4096 --temperature 0.0 --prompt "Convert this page to docling." --image <path_to_image>若要使用 mlx-vlm Python SDK 运行,将输出解析为 DoclingDocument 并导出为多种格式(例如 Markdown、HTML),请参考以下代码。
# /// script
# requires-python = ">=3.12"
# dependencies = [
# "docling-core",
# "mlx-vlm",
# "pillow",
# "transformers",
# ]
# ///
import webbrowser
from pathlib import Path
from docling_core.types.doc import ImageRefMode
from docling_core.types.doc.document import DocTagsDocument, DoclingDocument
from mlx_vlm import load, stream_generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
from transformers.image_utils import load_image
# Configuration
MODEL_PATH = "ibm-granite/granite-docling-258M-mlx"
PROMPT = "Convert this page to docling."
SHOW_IN_BROWSER = True
# Sample images (pick one...)
# SAMPLE_IMAGE = "https://huggingface.co/ibm-granite/granite-docling-258M/resolve/main/assets/new_arxiv.png"
# SAMPLE_IMAGE = "https://ibm.biz/docling-page-with-list"
SAMPLE_IMAGE = "https://ibm.biz/docling-page-with-table"
# Load model and processor
print("Loading model...")
model, processor = load(MODEL_PATH)
config = load_config(MODEL_PATH)
# Prepare input image and prompt
print("Preparing input...")
pil_image = load_image(SAMPLE_IMAGE)
formatted_prompt = apply_chat_template(processor, config, PROMPT, num_images=1)
# Generate DocTags output
print("Generating DocTags...\n")
output = ""
for token in stream_generate(
model, processor, formatted_prompt, [pil_image], max_tokens=4096, verbose=False
):
output += token.text
print(token.text, end="")
if "</doctag>" in token.text:
break
print("\n\nProcessing output...")
# Create DoclingDocument from generated DocTags
doctags_doc = DocTagsDocument.from_doctags_and_image_pairs([output], [pil_image])
doc = DoclingDocument.load_from_doctags(doctags_doc, document_name="Sample Document")
# Export to different formats
print("\nMarkdown output:\n")
print(doc.export_to_markdown())
# Save as HTML with embedded images
output_path = Path("./output.html")
doc.save_as_html(output_path, image_mode=ImageRefMode.EMBEDDED)
print(f"\nHTML saved to: {output_path}")
# Open in browser
if SHOW_IN_BROWSER:
webbrowser.open(f"file:///{str(output_path.resolve())}")