让每个人都成为动漫绘画师，即便你对绘画一窍不通。

夕阳图片

Controlnet-scribble-sdxl-1.0-anime

这是一个 controlnet-scribble-sdxl-1.0 模型，能够通过动漫草图生成极高质量的图像，支持任何类型和任何宽度的线条。从示例中可以看出，草图可以非常简单、不够清晰，我们假设你只是个孩子或对绘画一无所知的人，你只需简单涂鸦并添加一些 danbooru 标签，就能生成精美的动漫插画。在我们的评估中，该模型达到了业界领先水平，明显优于 lvming Zhang 训练的原始 SDXL1.5 Scribble 模型[https://github.com/lllyasviel/ControlNet]。该模型采用复杂技巧和高质量数据集进行训练，除了美学评分外，提示词遵循能力[由 Openai 在论文中提出(https://cdn.openai.com/papers/dall-e-3.pdf)]和图像畸形率[生成图像出现异常人体结构的概率]也有显著提升。Midjourney 的创始人曾说：midjourney 可以帮助不会画画的人创作，从而拓展他们的想象边界。我们也有类似的愿景：希望让那些不了解动漫或卡通的人，能以简单的方式创造属于自己的角色，表达自我，释放创造力。AIGC 将重塑动画产业，我们发布的模型生成的动漫图像，平均美学评分高于几乎所有热门动漫网站，尽情享受吧。如果想生成特别吸引人的图像，建议将 danbooru 标签与自然语言结合使用。由于动漫图像的数量远少于真实图像，你不能像输入“一个女孩走在街上”这样仅用自然语言，因为信息有限。相反，你需要更详细地描述，例如“一个女孩，蓝色衬衫，白色头发，黑色眼睛，微笑，粉色花朵，樱花……”。总之，你应该先用标签描述图像内容[danbooru 标签]，再用自然语言描述图像中的情景，细节越丰富越好。如果描述不够清晰，生成的图像会带有一定随机性，但无论如何，它都会与你绘制的条件图像相匹配，条件图像和生成图像的边缘检测也会一致。该模型在一定程度上能够从语义层面理解你的绘画，并为你生成不错的结果。据我们所知，开源社区中尚未出现其他 SDXL-Scribble 模型，我们很可能是第一个。

注意事项

若要使用我们的模型生成动漫图像，您需要从huggingface[https://huggingface.co/models?pipeline_tag=text-to-image&sort=trending&search=blue]或civitai[https://civitai.com/search/models?baseModel=SDXL%201.0&sortBy=models_v8&query=anime]选择一个动漫风格的sdxl基础模型。
此处展示的案例均基于CounterfeitXL[https://huggingface.co/gsdf/CounterfeitXL/tree/main]生成，不同的基础模型会呈现不同的图像风格，您也可以使用bluepencil或其他模型。该模型经过大量动漫图像训练，涵盖了互联网上几乎所有能找到的动漫图像。我们对这些图像进行了严格筛选，保留了视觉质量高、可与nijijourney或热门动漫插画相媲美的图像。我们使用controlnet-sdxl-1.0[https://arxiv.org/abs/2302.05543]进行训练，本报告不披露具体技术细节。

模型说明

开发者： xinsir
模型类型： ControlNet_SDXL
许可证： apache-2.0
微调基础模型[可选]： stabilityai/stable-diffusion-xl-base-1.0

模型来源[可选]

论文[可选]： https://arxiv.org/abs/2302.05543

示例展示

prompt: 1girl, breasts, solo, long hair, pointy ears, red eyes, horns, navel, sitting, cleavage, toeless legwear, hair ornament, smoking pipe, oni horns, thighhighs, detached sleeves, looking at viewer, smile, large breasts, holding smoking pipe, wide sleeves, bare shoulders, flower, barefoot, holding, nail polish, black thighhighs, jewelry, hair flower, oni, japanese clothes, fire, kiseru, very long hair, ponytail, black hair, long sleeves, bangs, red nails, closed mouth, toenails, navel cutout, cherry blossoms, water, red dress, fingernails

prompt: 1girl, solo, blonde hair, weapon, sword, hair ornament, hair flower, flower, dress, holding weapon, holding sword, holding, gloves, breasts, full body, black dress, thighhighs, looking at viewer, boots, bare shoulders, bangs, medium breasts, standing, black gloves, short hair with long locks, thigh boots, sleeveless dress, elbow gloves, sidelocks, black background, black footwear, yellow eyes, sleeveless

prompt: 1girl, solo, holding, white gloves, smile, purple eyes, gloves, closed mouth, balloon, holding microphone, microphone, blue flower, long hair, puffy sleeves, purple flower, blush, puffy short sleeves, short sleeves, bangs, dress, shoes, very long hair, standing, pleated dress, white background, flower, full body, blue footwear, one side up, arm up, hair bun, brown hair, food, mini crown, crown, looking at viewer, hair between eyes, heart balloon, heart, tilted headwear, single side bun, hand up

prompt: tiger, 1boy, male focus, blue eyes, braid, animal ears, tiger ears, 2022, solo, smile, chinese zodiac, year of the tiger, looking at viewer, hair over one eye, weapon, holding, white tiger, grin, grey hair, polearm, arm up, white hair, animal, holding weapon, arm behind head, multicolored hair, holding polearm

prompt: 1boy, male child, glasses, male focus, shorts, solo, closed eyes, bow, bowtie, smile, open mouth, red bow, jacket, red bowtie, white background, shirt, happy, black shorts, child, simple background, long sleeves, ^_^, short hair, white shirt, brown hair, black-framed eyewear, :d, facing viewer, black hair

prompt: solo, 1girl, swimsuit, blue eyes, plaid headwear, bikini, blue hair, virtual youtuber, side ponytail, looking at viewer, navel, grey bik ini, ribbon, long hair, parted lips, blue nails, hat, breasts, plaid, hair ribbon, water, arm up, bracelet, star (symbol), cowboy shot, stomach, thigh strap, hair between eyes, beach, small breasts, jewelry, wet, bangs, plaid bikini, nail polish, grey headwear, blue ribbon, adapted costume, choker, ocean, bare shoulders, outdoors, beret

prompt: fruit, food, no humans, food focus, cherry, simple background, english text, strawberry, signature, border, artist name, cream

prompt: 1girl, solo, ball, swimsuit, bikini, mole, beachball, white bikini, breasts, hairclip, navel, looking at viewer, hair ornament, chromatic aberration, holding, holding ball, pool, cleavage, water, collarbone, mole on breast, blush, bangs, parted lips, bare shoulders, mole on thigh, bare arms, smile, large breasts, blonde hair, halterneck, hair between eyes, stomach

如何开始使用模型

使用以下代码开始使用模型。


from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
from diffusers import DDIMScheduler, EulerAncestralDiscreteScheduler
from controlnet_aux import PidiNetDetector, HEDdetector
from diffusers.utils import load_image
from huggingface_hub import HfApi
from pathlib import Path
from PIL import Image
import torch
import numpy as np
import cv2
import os


def nms(x, t, s):
    x = cv2.GaussianBlur(x.astype(np.float32), (0, 0), s)

    f1 = np.array([[0, 0, 0], [1, 1, 1], [0, 0, 0]], dtype=np.uint8)
    f2 = np.array([[0, 1, 0], [0, 1, 0], [0, 1, 0]], dtype=np.uint8)
    f3 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.uint8)
    f4 = np.array([[0, 0, 1], [0, 1, 0], [1, 0, 0]], dtype=np.uint8)

    y = np.zeros_like(x)

    for f in [f1, f2, f3, f4]:
        np.putmask(y, cv2.dilate(x, kernel=f) == x, x)

    z = np.zeros_like(y, dtype=np.uint8)
    z[y > t] = 255
    return z


controlnet_conditioning_scale = 1.0  
prompt = "your prompt, the longer the better, you can describe it as detail as possible"
negative_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'


eulera_scheduler = EulerAncestralDiscreteScheduler.from_pretrained("gsdf/CounterfeitXL", subfolder="scheduler")


controlnet = ControlNetModel.from_pretrained(
    "xinsir/anime-painter",
    torch_dtype=torch.float16
)

# when test with other base model, you need to change the vae also.
vae = AutoencoderKL.from_pretrained("gsdf/CounterfeitXL", subfolder="vae", torch_dtype=torch.float16)

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "gsdf/CounterfeitXL",
    controlnet=controlnet,
    vae=vae,
    safety_checker=None,
    torch_dtype=torch.float16,
    scheduler=eulera_scheduler,
)

# you can use either hed to generate a fake scribble given an image or a sketch image totally draw by yourself
if random.random() > 0.5:
  # Method 1 
  # if you use hed, you should provide an image, the image can be real or anime, you extract its hed lines and use it as the scribbles
  # The detail about hed detect you can refer to https://github.com/lllyasviel/ControlNet/blob/main/gradio_fake_scribble2image.py
  # Below is a example using diffusers HED detector

  image_path = Image.open("your image path, the image can be real or anime, HED detector will extract its edge boundery")
  processor = HEDdetector.from_pretrained('lllyasviel/Annotators')
  controlnet_img = processor(image_path, scribble=False)
  controlnet_img.save("a hed detect path for an image")

  # following is some processing to simulate human sketch draw, different threshold can generate different width of lines
  controlnet_img = np.array(controlnet_img)
  controlnet_img = nms(controlnet_img, 127, 3)
  controlnet_img = cv2.GaussianBlur(controlnet_img, (0, 0), 3)

  # higher threshold, thiner line
  random_val = int(round(random.uniform(0.01, 0.10), 2) * 255)
  controlnet_img[controlnet_img > random_val] = 255
  controlnet_img[controlnet_img < 255] = 0
  controlnet_img = Image.fromarray(controlnet_img)

else:
  # Method 2
  # if you use a sketch image total draw by yourself
  control_path = "the sketch image you draw with some tools, like drawing board, the path you save it"
  controlnet_img = Image.open(control_path) # Note that the image must be black-white(0 or 255), like the examples we list

# must resize to 1024*1024 or same resolution bucket to get the best performance
width, height  = controlnet_img.size
ratio = np.sqrt(1024. * 1024. / (width * height))
new_width, new_height = int(width * ratio), int(height * ratio)
controlnet_img = controlnet_img.resize((new_width, new_height))

images = pipe(
    prompt,
    negative_prompt=negative_prompt,
    image=controlnet_img,
    controlnet_conditioning_scale=controlnet_conditioning_scale,
    width=new_width,
    height=new_height,
    num_inference_steps=30,
    ).images

images[0].save(f"your image save path, png format is usually better than jpg or webp in terms of image quality but got much bigger")

评估数据

测试数据随机采样自热门动漫壁纸图片（pixiv、nijijourney 等），本项目旨在让每个人都能绘制动漫插画。我们选取了 100 张图片，使用 waifu-tagger[https://huggingface.co/spaces/SmilingWolf/wd-tagger] 生成文本，并为每个提示生成 4 张图片，总共生成了 400 张图片。图片分辨率方面，SDXL 模型生成的图片应为 1024 * 1024 或相同尺寸组，SD1.5 模型生成的图片应为 512 * 768 或相同尺寸组。为了公平比较，我们将 SDXL 生成的图片调整为 512 * 768 或相同尺寸组。我们计算了 Laion 美学分数来衡量图片的美观度，并计算了感知相似度来衡量控制能力，发现图片质量与这些指标值具有良好的一致性。我们将我们的方法与其他 SOTA huggingface 模型进行了比较，并在下方列出了结果。我们的模型拥有最高的美学分数，并且如果提示得当，能够生成视觉上吸引人的图片。

定量结果

指标	xinsir/anime-painter	lllyasviel/control_v11p_sd15_scribble
laion_aesthetic	5.95	5.86
perceptual similarity	0.5171	0.577

laion_aesthetic（数值越高越好）
perceptual similarity（数值越低越好）

注意：这些值是在保存为 webp 格式时计算的，当保存为 png 格式时，美学值会增加 0.1-0.3，但相对关系保持不变。

结论

在我们的评估中，与 lllyasviel/control_v11p_sd15_scribble 相比，我们的模型在动漫图片上获得了更好的美学分数。我们希望与其他 sdxl-1.0-scribble 模型进行比较，但未找到相关模型。由于更大的基础模型和复杂的数据增强，在感知相似度测试中，我们的模型展现出更好的控制能力。此外，该模型生成包含异常人体结构等异常图片的概率更低。