sotediffusion-v2

SoteDiffusion Wuerstchen3

Würstchen V3 的动漫微调版本。

发布说明

本版本由 fal.ai/grants 赞助发布
使用 8 张 A100 80G GPU，在 600 万张图像上训练了 3 个 epoch。

API 使用方法

可通过 Fal.AI 的 API 使用本模型
详情请见：https://fal.ai/models/fal-ai/stable-cascade/sote-diffusion

界面使用指南

SD.Next

网址：https://github.com/vladmandic/automatic/

进入“模型”->“Huggingface”，在模型名称处输入 Disty0/sotediffusion-wuerstchen3-decoder，然后点击下载。
下载完成后，加载 Disty0/sotediffusion-wuerstchen3-decoder。

提示词：

newest, extremely aesthetic, best quality,

反向提示词：

very displeasing, worst quality, monochrome, realistic, oldest, loli,

参数：
采样器：默认

步数：30 或 40
优化器步数：10

CFG：7
次要 CFG：2 或 1

分辨率：1024x1536、2048x1152
只要是 128 的倍数，任何分辨率均可。

ComfyUI

请参考 CivitAI：https://civitai.com/models/353284

代码示例

pip install diffusers

import torch
from diffusers import StableCascadeCombinedPipeline

device = "cuda"
dtype = torch.bfloat16 # or torch.float16
model = "Disty0/sotediffusion-wuerstchen3-decoder"

pipe = StableCascadeCombinedPipeline.from_pretrained(model, torch_dtype=dtype)

# send everything to the gpu:
pipe = pipe.to(device, dtype=dtype)
pipe.prior_pipe = pipe.prior_pipe.to(device, dtype=dtype)

# or enable model offload to save vram:
# pipe.enable_model_cpu_offload()



prompt = "newest, extremely aesthetic, best quality, 1girl, solo, cat ears, pink hair, orange eyes, long hair, bare shoulders, looking at viewer, smile, indoors, casual, living room, playing guitar,"
negative_prompt = "very displeasing, worst quality, monochrome, realistic, oldest, loli,"
output = pipe(
    width=1024,
    height=1536,
    prompt=prompt,
    negative_prompt=negative_prompt,
    decoder_guidance_scale=2.0,
    prior_guidance_scale=7.0,
    prior_num_inference_steps=30,
    output_type="pil",
    num_inference_steps=10
).images[0]

## do something with the output image

训练：

使用软件：Kohya SD-Scripts（Stable Cascade 分支）。
https://github.com/kohya-ss/sd-scripts/tree/stable-cascade

使用 GPU：8 块 Nvidia A100 80GB
GPU 时长：220 小时

基础训练

参数	值
amp	bf16
权重	fp32
保存权重	fp16
分辨率	1024x1024
有效批大小	128
unet 学习率	1e-5
te 学习率	4e-6
优化器	Adafactor
图像数量	600 万
训练轮次	3

最终训练

参数	值
amp	bf16
权重	fp32
保存权重	fp16
分辨率	1024x1024
有效批大小	128
unet 学习率	4e-6
te 学习率	无
优化器	Adafactor
图像数量	12 万
训练轮次	16

数据集：

用于标注的 GPU：1 块 Intel ARC A770 16GB
GPU 时长：350 小时

用于标注的模型：SmilingWolf/wd-swinv2-tagger-v3
用于文本的模型：llava-hf/llava-1.5-7b-hf

命令：

python /mnt/DataSSD/AI/Apps/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py --model_dir "/mnt/DataSSD/AI/models/wd14_tagger_model" --repo_id "SmilingWolf/wd-swinv2-tagger-v3" --recursive --remove_underscore --use_rating_tags --character_tags_first --character_tag_expand --append_tags --onnx --caption_separator ", " --general_threshold 0.35 --character_threshold 0.50 --batch_size 4 --caption_extension ".txt" ./

数据集名称	图片总数
newest	1,848,331
recent	1,380,630
mid	993,227
early	566,152
oldest	160,397
pixiv	343,614
visual novel cg	231,358
anime wallpaper	104,790
Total	5,628,499

注意：

最小尺寸为 1280x600 | 768,000 像素
使用 czkawka-cli 基于图像相似度进行去重
约 12 万张超高质量图片被有意重复 5 次，使图片总数达到 620 万

标签	日期
最新	2022 年至 2024 年
近期	2019 年至 2021 年
中期	2015 年至 2018 年
早期	2011 年至 2014 年
最早	2005 年至 2010 年

分数大于	标签	数量
0.90	extremely aesthetic	125.451
0.80	very aesthetic	887.382
0.70	aesthetic	1.049.857
0.50	slightly aesthetic	1.643.091
0.40	not displeasing	569.543
0.30	not aesthetic	445.188
0.20	slightly displeasing	341.424
0.10	displeasing	237.660
其余	very displeasing	328.712

分数大于	标签	数量
0.980	best quality	1.270.447
0.900	high quality	498.244
0.750	great quality	351.006
0.500	medium quality	366.448
0.250	normal quality	368.380
0.125	bad quality	279.050
0.025	low quality	538.958
其余	worst quality	1.955.966

评级标签：

标签	数量
general	1.416.451
sensitive	3.447.664
nsfw	427.459
explicit nsfw	336.925

自定义标签：

数据集名称	自定义标签
image boards	date,
text	The text says "text",
characters	character, series
pixiv	art by Display_Name,
visual novel cg	Full_VN_Name (short_3_letter_name), visual novel cg,
anime wallpaper	date, anime wallpaper,

局限性与偏差

偏差

本模型专为动漫插画设计。
其写实能力完全未经测试。

局限性

可能会生成写实风格内容。
出现这种情况时，需在负面提示词中添加“realistic”标签。
远景中的眼睛和手部可能效果不佳。

许可证

SoteDiffusion 模型遵循 Fair AI Public License 1.0-SD 许可证，该许可证与 Stable Diffusion 模型的许可证兼容。核心要点如下：

修改共享：若对 SoteDiffusion 模型进行修改，必须同时分享所做的更改及原始许可证。
源代码可访问性：若修改后的版本可通过网络访问，需提供获取源代码的途径（如下载链接）。这同样适用于衍生模型。
分发条款：任何分发行为必须基于本许可证或其他具有类似规则的许可证。
合规性：若出现不合规情况，必须在 30 天内纠正，以避免许可证终止，这强调了透明度和对开源价值观的遵循。

注意：Fair AI 许可证未涵盖的内容，均继承自 Stability AI 非商业许可证，该许可证命名为 LICENSE_INHERIT。

新版本已发布：https://huggingface.co/Disty0/sotediffusion-v2

SoteDiffusion Wuerstchen3

Würstchen V3 的动漫微调版本。

发布说明

本版本由 fal.ai/grants 赞助发布
使用 8 张 A100 80G GPU，在 600 万张图像上训练了 3 个 epoch。

API 使用方法

可通过 Fal.AI 的 API 使用本模型
详情请见：https://fal.ai/models/fal-ai/stable-cascade/sote-diffusion

界面使用指南

SD.Next

网址：https://github.com/vladmandic/automatic/

提示词：

newest, extremely aesthetic, best quality,

反向提示词：

very displeasing, worst quality, monochrome, realistic, oldest, loli,

参数：
采样器：默认

步数：30 或 40
优化器步数：10

CFG：7
次要 CFG：2 或 1

分辨率：1024x1536、2048x1152
只要是 128 的倍数，任何分辨率均可。

ComfyUI

请参考 CivitAI：https://civitai.com/models/353284

代码示例

pip install diffusers

import torch
from diffusers import StableCascadeCombinedPipeline

device = "cuda"
dtype = torch.bfloat16 # or torch.float16
model = "Disty0/sotediffusion-wuerstchen3-decoder"

pipe = StableCascadeCombinedPipeline.from_pretrained(model, torch_dtype=dtype)

# send everything to the gpu:
pipe = pipe.to(device, dtype=dtype)
pipe.prior_pipe = pipe.prior_pipe.to(device, dtype=dtype)

# or enable model offload to save vram:
# pipe.enable_model_cpu_offload()



prompt = "newest, extremely aesthetic, best quality, 1girl, solo, cat ears, pink hair, orange eyes, long hair, bare shoulders, looking at viewer, smile, indoors, casual, living room, playing guitar,"
negative_prompt = "very displeasing, worst quality, monochrome, realistic, oldest, loli,"
output = pipe(
    width=1024,
    height=1536,
    prompt=prompt,
    negative_prompt=negative_prompt,
    decoder_guidance_scale=2.0,
    prior_guidance_scale=7.0,
    prior_num_inference_steps=30,
    output_type="pil",
    num_inference_steps=10
).images[0]

## do something with the output image

训练：

使用软件：Kohya SD-Scripts（Stable Cascade 分支）。
https://github.com/kohya-ss/sd-scripts/tree/stable-cascade

使用 GPU：8 块 Nvidia A100 80GB
GPU 时长：220 小时

基础训练

参数	值
amp	bf16
权重	fp32
保存权重	fp16
分辨率	1024x1024
有效批大小	128
unet 学习率	1e-5
te 学习率	4e-6
优化器	Adafactor
图像数量	600 万
训练轮次	3

最终训练

参数	值
amp	bf16
权重	fp32
保存权重	fp16
分辨率	1024x1024
有效批大小	128
unet 学习率	4e-6
te 学习率	无
优化器	Adafactor
图像数量	12 万
训练轮次	16

数据集：

用于标注的 GPU：1 块 Intel ARC A770 16GB
GPU 时长：350 小时

用于标注的模型：SmilingWolf/wd-swinv2-tagger-v3
用于文本的模型：llava-hf/llava-1.5-7b-hf

命令：

python /mnt/DataSSD/AI/Apps/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py --model_dir "/mnt/DataSSD/AI/models/wd14_tagger_model" --repo_id "SmilingWolf/wd-swinv2-tagger-v3" --recursive --remove_underscore --use_rating_tags --character_tags_first --character_tag_expand --append_tags --onnx --caption_separator ", " --general_threshold 0.35 --character_threshold 0.50 --batch_size 4 --caption_extension ".txt" ./

数据集名称	图片总数
newest	1,848,331
recent	1,380,630
mid	993,227
early	566,152
oldest	160,397
pixiv	343,614
visual novel cg	231,358
anime wallpaper	104,790
Total	5,628,499

注意：

最小尺寸为 1280x600 | 768,000 像素
使用 czkawka-cli 基于图像相似度进行去重
约 12 万张超高质量图片被有意重复 5 次，使图片总数达到 620 万

标签	日期
最新	2022 年至 2024 年
近期	2019 年至 2021 年
中期	2015 年至 2018 年
早期	2011 年至 2014 年
最早	2005 年至 2010 年

分数大于	标签	数量
0.90	extremely aesthetic	125.451
0.80	very aesthetic	887.382
0.70	aesthetic	1.049.857
0.50	slightly aesthetic	1.643.091
0.40	not displeasing	569.543
0.30	not aesthetic	445.188
0.20	slightly displeasing	341.424
0.10	displeasing	237.660
其余	very displeasing	328.712

分数大于	标签	数量
0.980	best quality	1.270.447
0.900	high quality	498.244
0.750	great quality	351.006
0.500	medium quality	366.448
0.250	normal quality	368.380
0.125	bad quality	279.050
0.025	low quality	538.958
其余	worst quality	1.955.966

评级标签：

标签	数量
general	1.416.451
sensitive	3.447.664
nsfw	427.459
explicit nsfw	336.925

自定义标签：

数据集名称	自定义标签
image boards	date,
text	The text says "text",
characters	character, series
pixiv	art by Display_Name,
visual novel cg	Full_VN_Name (short_3_letter_name), visual novel cg,
anime wallpaper	date, anime wallpaper,

局限性与偏差

偏差

本模型专为动漫插画设计。
其写实能力完全未经测试。

局限性

可能会生成写实风格内容。
出现这种情况时，需在负面提示词中添加“realistic”标签。
远景中的眼睛和手部可能效果不佳。

许可证

SoteDiffusion 模型遵循 Fair AI Public License 1.0-SD 许可证，该许可证与 Stable Diffusion 模型的许可证兼容。核心要点如下：

修改共享：若对 SoteDiffusion 模型进行修改，必须同时分享所做的更改及原始许可证。
源代码可访问性：若修改后的版本可通过网络访问，需提供获取源代码的途径（如下载链接）。这同样适用于衍生模型。
分发条款：任何分发行为必须基于本许可证或其他具有类似规则的许可证。
合规性：若出现不合规情况，必须在 30 天内纠正，以避免许可证终止，这强调了透明度和对开源价值观的遵循。

注意：Fair AI 许可证未涵盖的内容，均继承自 Stability AI 非商业许可证，该许可证命名为 LICENSE_INHERIT。

新版本已发布：https://huggingface.co/Disty0/sotediffusion-v2

SoteDiffusion Wuerstchen3

发布说明

API 使用方法

界面使用指南

SD.Next

ComfyUI

代码示例

训练：

基础训练

最终训练

数据集：

标签：

日期：

美学标签：

质量标签：

评级标签：

自定义标签：

局限性与偏差

偏差

局限性

许可证

新版本已发布：https://huggingface.co/Disty0/sotediffusion-v2

SoteDiffusion Wuerstchen3

发布说明

API 使用方法

界面使用指南

SD.Next

ComfyUI

代码示例

训练：

基础训练

最终训练

数据集：

标签：

日期：

美学标签：

质量标签：

评级标签：

自定义标签：

局限性与偏差

偏差

局限性

许可证