SDXL-Turbo 模型卡片

row01 SDXL-Turbo 是一款快速生成式文本到图像模型，仅需一次网络推理即可根据文本提示合成逼真图像。实时演示可在此处查看：http://clipdrop.co/stable-diffusion-turbo

模型详情

模型描述

SDXL-Turbo 是 SDXL 1.0 的蒸馏版本，专为实时合成而训练。 SDXL-Turbo 基于一种名为对抗扩散蒸馏（ADD）的新型训练方法（详见技术报告），该方法能够以 1 到 4 步采样生成高质量的大规模基础图像扩散模型。此方法利用分数蒸馏技术，将大规模现成图像扩散模型作为教师信号，并结合对抗损失，确保即使在仅一到两步的低采样步数下也能实现高图像保真度。

开发机构： Stability AI
资助机构： Stability AI
模型类型： 生成式文本到图像模型
微调基础模型： SDXL 1.0 Base

模型资源

出于研究目的，我们推荐使用 generative-models GitHub 仓库（https://github.com/Stability-AI/generative-models），该仓库实现了最流行的扩散框架（包括训练和推理）。

代码仓库： https://github.com/Stability-AI/generative-models
论文： https://stability.ai/research/adversarial-diffusion-distillation
演示： http://clipdrop.co/stable-diffusion-turbo

评估

comparison1 comparison2 上面的图表评估了用户对 SDXL-Turbo 相较于其他单步和多步模型的偏好。在图像质量和提示遵循度方面，人类评估者更倾向于单步运行的 SDXL-Turbo，而非四步（或更少步数）运行的 LCM-XL。此外，我们发现将 SDXL-Turbo 的采样步数增加到四时，性能会进一步提升。有关用户研究的详细信息，请参考研究论文。

用途

直接使用

本模型仅用于研究目的。可能的研究领域和任务包括：

生成模型研究。
生成模型的实时应用研究。
实时生成模型的影响研究。
具有生成有害内容潜力的模型的安全部署。
探究和理解生成模型的局限性与偏差。
艺术作品生成以及在设计和其他艺术过程中的使用。
在教育或创意工具中的应用。

以下是排除的用途。

扩散器

pip install diffusers transformers accelerate --upgrade

文本转图像：

SDXL-Turbo 不使用 guidance_scale 或 negative_prompt，我们通过设置 guidance_scale=0.0 将其禁用。理想情况下，该模型生成 512x512 尺寸的图像，但也支持更高的图像尺寸。仅需单步即可生成高质量图像。

from diffusers import AutoPipelineForText2Image
import torch
from openmind_hub import snapshot_download

model_dir = snapshot_download("PyTorch-NPU/sdxl-turbo")

pipe = AutoPipelineForText2Image.from_pretrained(model_dir, torch_dtype=torch.float16, variant="fp16")
pipe.to("npu")

prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."

image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0.0).images[0]
image.save("image.png")

图像到图像：

使用 SDXL-Turbo 进行图像到图像生成时，请确保 num_inference_steps * strength 大于或等于 1。图像到图像管道将运行 int(num_inference_steps * strength) 步，例如在下面的示例中 0.5 * 2.0 = 1 步。

from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
from openmind_hub import snapshot_download
model_dir = snapshot_download("PyTorch-NPU/sdxl-turbo)
pipe = AutoPipelineForImage2Image.from_pretrained(model_dir,  variant="fp16")

init_image = load_image("image.png").resize((512, 512))

prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"

image = pipe(prompt, image=init_image, num_inference_steps=2, strength=0.5, guidance_scale=0.0).images[0]
image.save("image1.png")

超出范围的使用

本模型未针对生成真实或如实反映人物、事件的内容进行训练，因此使用本模型生成此类内容超出了其能力范围。
不得将本模型用于任何违反Stability AI《可接受使用政策》（https://stability.ai/use-policy）的用途。

局限性与偏差

局限性

生成图像的分辨率固定（512x512像素），且模型无法实现完美的照片级真实感。
模型无法渲染清晰可辨的文本。
人脸及人物整体可能无法正常生成。
模型的自动编码部分存在信息损失。

建议

本模型仅用于研究目的。

模型使用入门

请访问 https://github.com/Stability-AI/generative-models