Stable Diffusion 3.5 Large Controlnet - Canny

ControlNet Demo Image

模型

本仓库提供了适用于Stable Diffusion 3.5 Large的Canny ControlNet。

请注意：本模型根据Stability社区许可协议发布。访问Stability AI了解详情，或联系我们获取商业许可信息。

许可协议

许可协议的核心条款如下：

非商业用途免费：个人和组织可免费将模型用于非商业用途，包括科学研究。
商业用途免费（年收入不超过100万美元）：初创企业、中小型企业和创作者可免费将模型用于商业目的，前提是其年度总收入低于100万美元。
输出内容所有权：保留生成媒体的所有权，不受限制性许可条款影响。

对于年收入超过100万美元的组织，请通过此处联系我们，咨询企业许可事宜。

使用方法

在SD3.5独立仓库中使用Controlnets

安装仓库：

git clone git@github.com:Stability-AI/sd3.5.git
pip install -r requirements.txt

然后，像这样下载模型和示例图片：

input/sample_cond.png
models/clip_g.safetensors
models/clip_l.safetensors
models/t5xxl.safetensors
models/sd3.5_large.safetensors
models/canny_8b.safetensors

然后你可以运行

python sd3_infer.py --controlnet_ckpt models/canny_8b.safetensors --controlnet_cond_image input/sample_cond.png --prompt "An adorable fluffy pastel creature"

这应该会生成如下所示的图像：

An adorable fluffy pastel creature

在 Diffusers 中使用 Controlnet

确保升级到最新版本的 diffusers：pip install -U diffusers。然后您可以运行：

import torch
from diffusers import StableDiffusion3ControlNetPipeline, SD3ControlNetModel
from diffusers.utils import load_image
from diffusers.image_processor import VaeImageProcessor

class SD3CannyImageProcessor(VaeImageProcessor):
    def __init__(self):
        super().__init__(do_normalize=False)
    def preprocess(self, image, **kwargs):
        image = super().preprocess(image, **kwargs)
        image = image * 255 * 0.5 + 0.5
        return image
    def postprocess(self, image, do_denormalize=True, **kwargs):
        do_denormalize = [True] * image.shape[0]
        image = super().postprocess(image, **kwargs, do_denormalize=do_denormalize)
        return image

controlnet = SD3ControlNetModel.from_pretrained("stabilityai/stable-diffusion-3.5-large-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusion3ControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")
pipe.image_processor = SD3CannyImageProcessor()

control_image = load_image("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/canny.png")
prompt =  "A Night time photo taken by Leica M11, portrait of a Japanese woman in a kimono, looking at the camera, Cherry blossoms"

generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(
    prompt, 
    control_image=control_image, 
    controlnet_conditioning_scale=1.0,
    guidance_scale=3.5,
    num_inference_steps=60,
    generator=generator,
    max_sequence_length=77,
).images[0]
image.save(f'canny-8b.jpg')

预处理

可按照以下代码片段对输入图像进行预处理以用于控制。SD3.5 未实现此功能，因此建议事先在外部脚本中完成。

import torchvision.transforms.functional as F
# assuming img is a PIL image
img = F.to_tensor(img)
img = cv2.cvtColor(img.transpose(1, 2, 0), cv2.COLOR_RGB2GRAY)
img = cv2.Canny(img, 100, 200)

提示

建议从 0.8 的 ControlNet 强度开始，并根据需要进行调整。
Euler 采样器和稍高的步数（50-60）可获得最佳效果。
传递 --text_encoder_device <device_name> 可将文本编码器直接加载到 VRAM，这能加快整个推理循环的速度，但会占用更多 VRAM。

用途

模型的所有使用都必须符合我们的可接受使用政策。

超出范围的用途

该模型并非为生成关于人物或事件的事实性或真实表征而训练。因此，使用该模型生成此类内容超出了本模型的能力范围。

训练数据和策略

这些模型是在多种数据上训练的，包括合成数据和经过筛选的公开可用数据。

安全性

我们秉持安全、负责任的 AI 实践理念，并采取审慎措施确保在开发的早期阶段就开始注重完整性。这意味着我们已经并将继续采取合理措施，防止不良行为者滥用 Stable Diffusion 3.5。有关我们安全方法的更多信息，请访问我们的安全页面。

完整性评估

我们的完整性评估方法包括结构化评估和针对特定危害的红队测试。测试主要以英语进行，可能未涵盖所有可能的危害。

已识别的风险和缓解措施：

有害内容：我们在训练模型时使用了经过筛选的数据集，并实施了安全措施，力求在实用性和防止危害之间取得适当平衡。但这并不能保证所有可能的有害内容都已被移除。所有开发者和部署者都应保持谨慎，并根据其特定的产品政策和应用用例实施内容安全防护措施。
滥用：技术限制以及对开发者和终端用户的教育有助于减轻模型的恶意应用。所有用户都必须遵守我们的可接受使用政策，包括在应用微调及提示工程机制时。有关我们产品的违规使用信息，请参考 Stability AI 可接受使用政策。
隐私侵犯：鼓励开发者和部署者遵守隐私法规，采用尊重数据隐私的技术。

致谢

Lvmin Zhang、Anyi Rao 和 Maneesh Agrawala，原始 ControlNet 论文的作者。
Lvmin Zhang，其开发的 Tile ControlNet 为 Blur ControlNet 提供了灵感。
Diffusers 库的作者，开发过程中参考了他们的代码。
InstantX 团队，训练过程中参考了他们的 Flux 和 SD3 ControlNets。
所有模型的早期测试者和评估者，以及 Stability AI 团队。

联系方式

如遇模型相关问题或需联系我们，请通过以下方式：

安全问题：safety@stability.ai
安全漏洞：security@stability.ai
隐私问题：privacy@stability.ai
许可及一般问题：https://stability.ai/license
企业许可：https://stability.ai/enterprise