| 名称 | 脚本 | 推理步数 | HuggingFace 仓库 |
|---|---|---|---|
| HiDream-O1-Image-Dev-2604 | inference.py | 28 | 🤗 HiDream-O1-Image-Dev-2604 |
| Prompt Agent 2604 | prompt_agent_v2.py | — | 🤗 HiDream-ai/Prompt-Refine |
git clone https://github.com/HiDream-ai/HiDream-O1-Image.git
cd HiDream-O1-Image
git checkout devpip install -r requirements.txt关于
flash-attn的说明:我们强烈建议安装flash-attn以实现优化的注意力计算。如果您不(或无法)安装flash-attn,则必须编辑models/pipeline.py的第 291 行,将"use_flash_attn": True修改为"use_flash_attn": False——否则推理过程将无法导入内核。
HiDream-O1-Image 附带了一个推理驱动提示词代理(prompt_agent_v2.py),它能明确地对布局、主体属性、物理逻辑和文本渲染细节进行推理,然后将原始用户指令重写为一个自包含的英文提示词。将其输出输入到 inference.py 中,能在处理复杂、推理密集型请求时获得最佳结果。
该代理通过 vLLM 与提供 HiDream-ai/Prompt-Refine 服务的 OpenAI 兼容端点进行交互。
huggingface-cli download HiDream-ai/Prompt-Refine \
--local-dir HiDream-ai/Prompt-Refinebash start_vllm_server.sh这会在 http://localhost:8000/v1 上启动 HiDream-ai/Prompt-Refine。
python prompt_agent_v2.py \
--prompt "A vintage aviation poster featuring a bright red biplane cruising over rolling farmlands. Bold blocky text at the bottom promises adventure in the friendly skies."默认情况下,脚本的目标地址为 http://localhost:8000/v1,模型为 HiDream-ai/Prompt-Refine;如果您在其他位置部署模型,可通过 --base_url 或 --model_id 参数进行覆盖。同一模块还提供了一个可复用的 refine_prompt(prompt, model_id=..., base_url=...) 函数,供 app.py 调用。
推理需要具备 CUDA 能力的 GPU。以下示例使用未蒸馏模型(--model_type full);有关使用蒸馏模型(--model_type dev)运行相同任务的方法,请参见最后一小节。
根据文本提示生成图像:
python inference.py \
--model_path /path/to/HiDream-O1-Image-Dev-2604 \
--prompt "A vintage aviation poster depicting a bright red biplane cruising over rolling farmlands under a partly cloudy sky, with saturated colors and an aged paper texture. A red biplane with two sets of wings and a radial engine is positioned in the upper center of the image, flying toward the right. A pilot with light skin, wearing a brown flight helmet, goggles, and a brown jacket, is visible in the open cockpit. The biplane has black wheels with red hubs and a spinning propeller. Below, the landscape consists of rolling fields in various shades of green, yellow, and brown, divided by dirt roads and scattered with small houses, including a red barn, a brown house, and a white house. In the background, a line of green trees separates the fields from distant hills under a blue sky with white clouds. The poster has a textured, aged paper border with visible creases and discoloration. At the bottom, the text \"ADVENTURE IN THE FRIENDLY SKIES\" is displayed in large, bold, dark brown capital letters across two lines on a light beige background." \
--output_image results/t2i.png \
--height 2048 \
--width 2048本仓库中的代码以及 HiDream-O1-Image-Dev-2604 模型均采用 MIT 许可协议。
@article{hidreamolimage,
title={HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer},
author={Cai, Qi and Chen, Jingwen and Gao, Chengmin and Gong, Zijian and Li, Yehao and Mei, Tao and Pan, Yingwei and Peng, Yi and Qiu, Zhaofan and Yao, Ting and Yu, Kai and Zhang, Yiheng and others},
journal={arXiv preprint arXiv:2605.11061},
year={2026}
}