FLUX.1-dev-Controlnet-Union

发布

[2024/08/26] 🔥 发布 FLUX.1-dev-ControlNet-Union-Pro。请在下一次发布前从源代码安装。我们已通过此PR支持CN-Union和Multi-ControlNets。
[2024/08/20] 发布测试版。
[2024/08/14] 发布Alpha版。

检查点

联合ControlNet的训练需要大量的计算资源。当前发布的是第一个测试版检查点，可能尚未完全训练。完全训练的测试版正在训练过程中。我们已经进行了消融研究，证明了代码的有效性。第一个测试版的开源发布纯粹是为了促进开源社区和Flux生态系统的快速发展；常见的情况是会遇到不良案例（请接受我的歉意）。值得注意的是，我们发现即使是一个完全训练的联合模型，其表现也可能不如专门的模型，例如姿态控制。然而，随着训练的进行，联合模型的表现将继续接近专门模型的水平。

控制模式

控制模式	描述	当前模型有效性
0	canny	🟢高
1	tile	🟢高
2	depth	🟢高
3	blur	🟢高
4	pose	🟢高
5	gray	🔴低
6	lq	🟢高

推理

import torch
from diffusers.utils import load_image
from diffusers import FluxControlNetPipeline, FluxControlNetModel

base_model = 'black-forest-labs/FLUX.1-dev'
controlnet_model = 'InstantX/FLUX.1-dev-Controlnet-Union'

controlnet = FluxControlNetModel.from_pretrained(controlnet_model, torch_dtype=torch.bfloat16)
pipe = FluxControlNetPipeline.from_pretrained(base_model, controlnet=controlnet, torch_dtype=torch.bfloat16)
pipe.to("cuda")

control_image_canny = load_image("https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Union-alpha/resolve/main/images/canny.jpg")
controlnet_conditioning_scale = 0.5
control_mode = 0

width, height = control_image.size

prompt = 'A bohemian-style female travel blogger with sun-kissed skin and messy beach waves.'

image = pipe(
    prompt, 
    control_image=control_image,
    control_mode=control_mode,
    width=width,
    height=height,
    controlnet_conditioning_scale=controlnet_conditioning_scale,
    num_inference_steps=24, 
    guidance_scale=3.5,
).images[0]
image.save("image.jpg")

多控制推理

import torch
from diffusers.utils import load_image
from diffusers import FluxControlNetPipeline, FluxControlNetModel, FluxMultiControlNetModel

base_model = 'black-forest-labs/FLUX.1-dev'
controlnet_model_union = 'InstantX/FLUX.1-dev-Controlnet-Union'

controlnet_union = FluxControlNetModel.from_pretrained(controlnet_model_union, torch_dtype=torch.bfloat16)
controlnet = FluxMultiControlNetModel([controlnet_union]) # we always recommend loading via FluxMultiControlNetModel

pipe = FluxControlNetPipeline.from_pretrained(base_model, controlnet=controlnet, torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = 'A bohemian-style female travel blogger with sun-kissed skin and messy beach waves.'
control_image_depth = load_image("https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Union/resolve/main/images/depth.jpg")
control_mode_depth = 2

control_image_canny = load_image("https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Union/resolve/main/images/canny.jpg")
control_mode_canny = 0

width, height = control_image.size

image = pipe(
    prompt, 
    control_image=[control_image_depth, control_image_canny],
    control_mode=[control_mode_depth, control_mode_canny],
    width=width,
    height=height,
    controlnet_conditioning_scale=[0.2, 0.4],
    num_inference_steps=24, 
    guidance_scale=3.5,
    generator=torch.manual_seed(42),
).images[0]

资源

致谢

感谢zzzzzero在训练过程中帮助我们指出了一些bug。

检查点

控制模式

描述

当前模型有效性

canny

🟢高

tile

🟢高

depth

🟢高

blur

🟢高

pose

🟢高

gray

🔴低

🟢高

推理

import torch
from diffusers.utils import load_image
from diffusers import FluxControlNetPipeline, FluxControlNetModel

base_model = 'black-forest-labs/FLUX.1-dev'
controlnet_model = 'InstantX/FLUX.1-dev-Controlnet-Union'

controlnet = FluxControlNetModel.from_pretrained(controlnet_model, torch_dtype=torch.bfloat16)
pipe = FluxControlNetPipeline.from_pretrained(base_model, controlnet=controlnet, torch_dtype=torch.bfloat16)
pipe.to("cuda")

control_image_canny = load_image("https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Union-alpha/resolve/main/images/canny.jpg")
controlnet_conditioning_scale = 0.5
control_mode = 0

width, height = control_image.size

prompt = 'A bohemian-style female travel blogger with sun-kissed skin and messy beach waves.'

image = pipe(
    prompt, 
    control_image=control_image,
    control_mode=control_mode,
    width=width,
    height=height,
    controlnet_conditioning_scale=controlnet_conditioning_scale,
    num_inference_steps=24, 
    guidance_scale=3.5,
).images[0]
image.save("image.jpg")

多控制推理

import torch
from diffusers.utils import load_image
from diffusers import FluxControlNetPipeline, FluxControlNetModel, FluxMultiControlNetModel

base_model = 'black-forest-labs/FLUX.1-dev'
controlnet_model_union = 'InstantX/FLUX.1-dev-Controlnet-Union'

controlnet_union = FluxControlNetModel.from_pretrained(controlnet_model_union, torch_dtype=torch.bfloat16)
controlnet = FluxMultiControlNetModel([controlnet_union]) # we always recommend loading via FluxMultiControlNetModel

pipe = FluxControlNetPipeline.from_pretrained(base_model, controlnet=controlnet, torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = 'A bohemian-style female travel blogger with sun-kissed skin and messy beach waves.'
control_image_depth = load_image("https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Union/resolve/main/images/depth.jpg")
control_mode_depth = 2

control_image_canny = load_image("https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Union/resolve/main/images/canny.jpg")
control_mode_canny = 0

width, height = control_image.size

image = pipe(
    prompt, 
    control_image=[control_image_depth, control_image_canny],
    control_mode=[control_mode_depth, control_mode_canny],
    width=width,
    height=height,
    controlnet_conditioning_scale=[0.2, 0.4],
    num_inference_steps=24, 
    guidance_scale=3.5,
    generator=torch.manual_seed(42),
).images[0]