HuggingFace镜像/HiDream-O1-Image-Dev-2604
模型介绍文件和版本分析
下载使用量0

HiDream-O1-Image-Dev-2604 文本到图像生成

模型

名称脚本推理步数HuggingFace 仓库
HiDream-O1-Image-Dev-2604inference.py28🤗 HiDream-O1-Image-Dev-2604
Prompt Agent 2604prompt_agent_v2.py—🤗 HiDream-ai/Prompt-Refine

安装

  1. 克隆此仓库:
git clone https://github.com/HiDream-ai/HiDream-O1-Image.git
cd HiDream-O1-Image
git checkout dev
  1. 安装所需的依赖项:
pip install -r requirements.txt

关于 flash-attn 的说明:我们强烈建议安装 flash-attn 以实现优化的注意力计算。如果您不(或无法)安装 flash-attn,则必须编辑 models/pipeline.py 的第 291 行,将 "use_flash_attn": True 修改为 "use_flash_attn": False——否则推理过程将无法导入内核。

推理驱动提示词代理

HiDream-O1-Image 附带了一个推理驱动提示词代理(prompt_agent_v2.py),它能明确地对布局、主体属性、物理逻辑和文本渲染细节进行推理,然后将原始用户指令重写为一个自包含的英文提示词。将其输出输入到 inference.py 中,能在处理复杂、推理密集型请求时获得最佳结果。

该代理通过 vLLM 与提供 HiDream-ai/Prompt-Refine 服务的 OpenAI 兼容端点进行交互。

步骤 1 — 下载优化器权重

huggingface-cli download HiDream-ai/Prompt-Refine \
    --local-dir HiDream-ai/Prompt-Refine

步骤 2 — 启动 vLLM 服务器

bash start_vllm_server.sh

这会在 http://localhost:8000/v1 上启动 HiDream-ai/Prompt-Refine。

步骤 3 — 运行优化器

python prompt_agent_v2.py \
    --prompt "A vintage aviation poster featuring a bright red biplane cruising over rolling farmlands. Bold blocky text at the bottom promises adventure in the friendly skies."

默认情况下,脚本的目标地址为 http://localhost:8000/v1,模型为 HiDream-ai/Prompt-Refine;如果您在其他位置部署模型,可通过 --base_url 或 --model_id 参数进行覆盖。同一模块还提供了一个可复用的 refine_prompt(prompt, model_id=..., base_url=...) 函数,供 app.py 调用。

使用方法

推理需要具备 CUDA 能力的 GPU。以下示例使用未蒸馏模型(--model_type full);有关使用蒸馏模型(--model_type dev)运行相同任务的方法,请参见最后一小节。

1. 文本生成图像

根据文本提示生成图像:

python inference.py \
    --model_path /path/to/HiDream-O1-Image-Dev-2604 \
    --prompt "A vintage aviation poster depicting a bright red biplane cruising over rolling farmlands under a partly cloudy sky, with saturated colors and an aged paper texture. A red biplane with two sets of wings and a radial engine is positioned in the upper center of the image, flying toward the right. A pilot with light skin, wearing a brown flight helmet, goggles, and a brown jacket, is visible in the open cockpit. The biplane has black wheels with red hubs and a spinning propeller. Below, the landscape consists of rolling fields in various shades of green, yellow, and brown, divided by dirt roads and scattered with small houses, including a red barn, a brown house, and a white house. In the background, a line of green trees separates the fields from distant hills under a blue sky with white clouds. The poster has a textured, aged paper border with visible creases and discoloration. At the bottom, the text \"ADVENTURE IN THE FRIENDLY SKIES\" is displayed in large, bold, dark brown capital letters across two lines on a light beige background." \
    --output_image results/t2i.png \
    --height 2048 \
    --width 2048

许可协议

本仓库中的代码以及 HiDream-O1-Image-Dev-2604 模型均采用 MIT 许可协议。

引用说明

@article{hidreamolimage,
  title={HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer},
  author={Cai, Qi and Chen, Jingwen and Gao, Chengmin and Gong, Zijian and Li, Yehao and Mei, Tao and Pan, Yingwei and Peng, Yi and Qiu, Zhaofan and Yao, Ting and Yu, Kai and Zhang, Yiheng and others},
  journal={arXiv preprint arXiv:2605.11061},
  year={2026}
}