Z-Image-Turbo是一种文本到图像的扩散模型,能够在给定文本输入的情况下生成相符的图像。该模型是Z-Image的蒸馏版本,将推理速度提升至传统模型的300%,同时保持极高画面保真度。适用于实时交互、游戏素材生成、电商视觉设计等场景。
本模型使用的优化手段如下: 等价优化:FA 算法优化:FA
表 1 版本配套表
| 配套 | 版本 | 环境准备指导 |
|---|---|---|
| Python | 3.11.10 | - |
| torch | 2.8.0 | - |
# 增加软件包可执行权限,{version}表示软件版本号,{arch}表示CPU架构,{soc}表示昇腾AI处理器的版本。
chmod +x ./Ascend-cann-toolkit_{version}_linux-{arch}.run
chmod +x ./Ascend-cann-kernels-{soc}_{version}_linux.run
# 校验软件包安装文件的一致性和完整性
./Ascend-cann-toolkit_{version}_linux-{arch}.run --check
./Ascend-cann-kernels-{soc}_{version}_linux.run --check
# 安装
./Ascend-cann-toolkit_{version}_linux-{arch}.run --install
./Ascend-cann-kernels-{soc}_{version}_linux.run --install
# 设置环境变量
source /usr/local/Ascend/ascend-toolkit/set_env.sh下载软件包
wget https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp311-cp311-manylinux_2_28_aarch64.whl安装命令
pip3 install torch-2.8.0+cpu-cp311-cp311-manylinux_2_28_aarch64.whl下载软件包
wget https://gitcode.com/Ascend/pytorch/releases/download/v7.2.0-pytorch2.8.0/torch_npu-2.8.0-cp311-cp311-manylinux_2_28_aarch64.whl安装命令
pip3 install torch_npu-2.8.0-cp311-cp311-manylinux_2_28_aarch64.whl执行以下命令可检查PyTorch框架和torch_npu插件是否已成功安装。
python3 -c "import torch;import torch_npu; a = torch.randn(3, 4).npu(); print(a + a);"输出如下类似信息说明安装成功。
tensor([[-0.6066, 6.3385, 0.0379, 3.3356],
[ 2.9243, 3.3134, -1.5465, 0.1916],
[-2.1807, 0.2008, -1.1431, 2.1523]], device='npu:0')# 若环境镜像中没有gcc、g++,请用户自行安装
yum install gcc
yum install g++
# 导入头文件路径
export CPLUS_INCLUDE_PATH=/usr/include/c++/12/:/usr/include/c++/12/aarch64-openEuler-linux/:$CPLUS_INCLUDE_PATHgit clone https://modelers.cn/MindIE/Z-Image-Turbo.gitpip install git+https://github.com/huggingface/diffuserspip install -r requirements.txt更新torch_npu会导致与mindiesd版本不匹配的情况,需要重新源码编译安装mindiesd
git clone https://gitcode.com/Ascend/MindIE-SD.git && cd MindIE-SD
python setup.py bdist_wheel
cd dist
pip install mindiesd-*.whl Z-Image-Turbo权重下载地址-HF Z-Image-Turbo权重下载地址-ModelScope
修改权重配置文件:
vi ${model_path}/model_index.json做如下修改:
{
"_class_name": "ZImagePipeline",
"_diffusers_version": "0.36.0.dev0",
"scheduler": [
"diffusers",
"FlowMatchEulerDiscreteScheduler"
],
"text_encoder": [
"transformers",
"Qwen3Model"
],
"tokenizer": [
"transformers",
"Qwen2Tokenizer"
],
"transformer": [
"zimage",
"ZImageTransformer2DModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}export model_path="your local Z-Image-Turbo model path"(1)原始模型单卡
# 在环境中导入以下环境变量提高推理性能
export CPU_AFFINITY_CONF=2
export TASK_QUEUE_ENABLE=2
python inference.py \
--model_path ${model_path} \
--output_path "./output" \
--device_id 0 \
--prompt "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights." \
--width 1024 \
--height 1024 \
--infer_steps 9 \
--guidance_scale 0.0 \
--seed 42 参数说明:
(2)单卡+融合算子
# 在环境中导入以下环境变量提高推理性能
export CPU_AFFINITY_CONF=2
export TASK_QUEUE_ENABLE=2
# 置1开启LaserAttention融合算子
export FA_FUSE=1
# 置1开启AdaLn融合算子
export ADALN_FUSE=0
# 置1开启Rope融合算子
export ROPE_FUSE=0
# 置1将Matmul算子转换为NZ格式
export USE_NZ=0
python inference.py \
--model_path ${model_path} \
--output_path "./output" \
--device_id 0 \
--prompt "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights." \
--width 1024 \
--height 1024 \
--infer_steps 9 \
--guidance_scale 0.0 \
--seed 42 (3)Ulysses2+算子优化+通算掩盖
# 在环境中导入以下环境变量提高推理性能
export CPU_AFFINITY_CONF=2
export TASK_QUEUE_ENABLE=2
# 置1开启LaserAttention融合算子
export FA_FUSE=1
# 置1开启AdaLn融合算子
export ADALN_FUSE=0
# 置1开启Rope融合算子
export ROPE_FUSE=0
# 置1将Matmul算子转换为NZ格式
export USE_NZ=0
# 置1开启通算掩盖
export COMM_OVERLAP=1
ASCEND_RT_VISIBLE_DEVICES=1,2 torchrun --master_port=20095 --nproc_per_node=2 inference.py \
--model_path ${model_path} \
--output_path "./output" \
--prompt "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights." \
--width 1024 \
--height 1024 \
--infer_steps 9 \
--guidance_scale 0.0 \
--seed 42 \
--sequence_parallel| 硬件形态 | cpu规格 | batch size | 分辨率 | 迭代次数 | 性能 | 采样器 | 备注 |
|---|---|---|---|---|---|---|---|
| Atlas 800I A2(8×64G) | 48核(arm) | 1 | 1024*1024 | 9 | 3.7s | FlowMatchEulerDiscreteScheduler | 单卡运行 |
| Atlas 800I A2(8×64G) | 48核(arm) | 1 | 1024*1024 | 9 | 3.1s | FlowMatchEulerDiscreteScheduler | 单卡运行+融合算子 |
| Atlas 800I A2(8×64G) | 48核(arm) | 1 | 1024*1024 | 9 | 2.4s | FlowMatchEulerDiscreteScheduler | SP2+融合算子 |