HuggingFace镜像/CogView4-6B
模型介绍文件和版本分析
下载使用量0

CogView4-6B

🤗 体验空间 | 🌐 代码仓库 | 📜 CogView3 论文

示例图

推理要求与模型介绍

  • 分辨率:宽高必须介于 512px 至 2048px 之间,可被 32 整除,且确保最大像素数不超过 2^21 像素
  • 精度:BF16 / FP32(不支持 FP16,因其会导致溢出产生全黑图像)

使用 BF16 精度及 batchsize=4 进行测试时,显存占用如下表所示:

分辨率关闭模型CPU卸载开启模型CPU卸载开启模型CPU卸载
文本编码器4bit量化
512 * 51233GB20GB13G
1280 * 72035GB20GB13G
1024 * 102435GB20GB13G
1920 * 128039GB20GB14G

快速开始

首先请确保从源码安装 diffusers 库。

pip install git+https://github.com/huggingface/diffusers.git
cd diffusers
pip install -e .

然后,运行以下代码:

from diffusers import CogView4Pipeline

pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B", torch_dtype=torch.bfloat16)

# Open it for reduce GPU memory usage
pipe.enable_model_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

prompt = "A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background."
image = pipe(
    prompt=prompt,
    guidance_scale=3.5,
    num_images_per_prompt=1,
    num_inference_steps=50,
    width=1024,
    height=1024,
).images[0]

image.save("cogview4.png")

模型性能指标

我们在多个基准测试中进行了验证,并获得以下评分结果:

DPG-Bench

模型综合得分全局表现实体识别属性解析关系理解其他项目
SDXL74.6583.2782.4380.9186.7680.41
PixArt-alpha71.1174.9779.3278.6082.5776.96
SD3-Medium84.0887.9091.0188.8380.7088.68
DALL-E 383.5090.9789.6188.3990.5889.83
Flux.1-dev83.7985.8086.7989.9890.0489.90
Janus-Pro-7B84.1986.9088.9089.4089.3289.48
CogView4-6B85.1383.8590.3591.1791.1487.29

GenEval

模型综合得分单对象双对象计数能力颜色识别位置判断颜色归因
SDXL0.550.980.740.390.850.150.23
PixArt-alpha0.480.980.500.440.800.080.07
SD3-Medium0.740.990.940.720.890.330.60
DALL-E 30.670.960.870.470.830.430.45
Flux.1-dev0.660.980.790.730.770.220.45
Janus-Pro-7B0.800.990.890.590.900.790.66
CogView4-6B0.730.990.860.660.790.480.58

T2I-CompBench

模型色彩表现形态识别纹理渲染二维空间三维空间数理能力非空间剪辑复合三合一
SDXL0.58790.46870.52990.21330.35660.49880.31190.3237
PixArt-alpha0.66900.49270.64770.20640.39010.50580.31970.3433
SD3-Medium0.81320.58850.73340.32000.40840.61740.31400.3771
DALL-E 30.77850.62050.70360.28650.37440.58800.30030.3773
Flux.1-dev0.75720.50660.63000.27000.39920.61650.30650.3628
Janus-Pro-7B0.51450.33230.40690.15660.27530.44060.31370.3806
CogView4-6B0.77860.58800.69830.30750.37080.66260.30560.3869

中文文本准确性评估

模型精确率召回率F1分数Pick@4
Kolors0.60940.18860.28800.1633
CogView4-6B0.69690.55320.61680.3265

引用声明

🌟 如果您认为我们的工作对您有帮助,请考虑引用论文并留下宝贵的星标评价

@article{zheng2024cogview3,
  title={Cogview3: Finer and faster text-to-image generation via relay diffusion},
  author={Zheng, Wendi and Teng, Jiayan and Yang, Zhuoyi and Wang, Weihan and Chen, Jidong and Gu, Xiaotao and Dong, Yuxiao and Ding, Ming and Tang, Jie},
  journal={arXiv preprint arXiv:2403.05121},
  year={2024}
}

许可证

本模型基于 Apache 2.0 许可证 发布。