FLUX.2 [dev] 是一个拥有 320 亿参数的整流流(rectified flow)Transformer 模型,能够根据文本指令生成、编辑和组合图像。
表1 硬件设备
| 设备型号 | NPU配置 |
|---|---|
| Atlas 800I A2 | 8*64G |
| Atlas 800T A2 | 8*64G |
表2 软件版本配套表
| 配套 | 版本 | 环境准备指导 |
|---|---|---|
| cann | cann-8.5.0 | - |
| Python | Python 3.11.6 | - |
| torch | 2.8.0 | - |
| torch_npu | 2.8.0 | - |
| transformers | 4.51.0 | - |
| mindie | 2.3.0 | - |
1、执行以下命令下载
docker pull swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.3.0-800I-A2-py311-openeuler24.03-lts2、执行以下命令查看镜像是否下载成功
docker images | grep "2.3.0-800I"FLUX.2-dev 权重及配置文件说明
| 模型 | 权重 |
|---|---|
| FLUX.2-dev | huggingface下载链接 |
docker run -itd -u root \
--net=host \
--privileged=true \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci2 \
--device=/dev/davinci3 \
--device=/dev/davinci4 \
--device=/dev/davinci5 \
--device=/dev/davinci6 \
--device=/dev/davinci7 \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /root:/root \
-p 8001:8001 \
--shm-size 1024g \
--name mindie-wlh2 \
swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.3.0-800I-A2-py311-openeuler24.03-ltsdocker exec -it -u root mindie-wlh2 bashcd /
git clone https://modelers.cn/MindIE/FLUX.2-dev.git
pip install torch==2.8.0
pip install torch_npu==2.8.0
pip install torchvision==0.23.0
pip install diffusers==0.36.0
pip install huggingface-hub==0.36.0
pip uninstall mindiesd
cd /
git clone https://gitcode.com/Ascend/MindIE-SD.git && cd MindIE-SD
git checkout 5c574baec9f911747d4906d8d40748eda97772cb
python setup.py bdist_wheel
cd dist
pip install mindiesd-*.whl{
"_class_name": "Flux2Pipeline",
"_diffusers_version": "0.36.0.dev0",
"scheduler": [
"diffusers",
"FlowMatchEulerDiscreteScheduler"
],
"text_encoder": [
"transformers",
"Mistral3ForConditionalGeneration"
],
"tokenizer": [
"transformers",
"PixtralProcessor"
],
"transformer": [
"FLUX2dev",
"Flux2Transformer2DModel"
],
"vae": [
"diffusers",
"AutoencoderKLFlux2"
]
}需修改脚本中权重路径
需修改脚本中权重路径 未量化注释:--quant_desc_path ${quant_desc_path}
本次支持w8a16和w8a8两种量化,其中w8a16可以降低显存占用,但基本不加速;w8a8可以加速,但对精度有一定的影响,请根据实际情况使用 w8a8量化命令:
python ./quant_flux2.py \
--model_name /home/w00634320/FLUX.2-dev \
--device_id 0 \
--quant_mode w8a8 \
--w_sym \
--act_method 3 \
--quant_save_dir ./quant_w8a8_withoutData_use_disable_quant_layers \
--is_dynamicw8a16量化命令:
python ./quant_flux2.py \
--model_name /home/w00634320/FLUX.2-dev \
--device_id 0 \
--quant_mode w8a16 \
--w_sym \
--act_method 3 \
--quant_save_dir ./quant_w8a16_withoutData_use_disable_quant_layers