表 1 版本配套表
| 配套 | 版本 | 环境准备指导 |
|---|---|---|
| Python | 3.11.10 | - |
| torch | 2.9.0 | - |
注意:
# 增加软件包可执行权限,{version}表示软件版本号,{arch}表示CPU架构,{soc}表示昇腾AI处理器的版本。
chmod +x ./Ascend-cann-toolkit_{version}_linux-{arch}.run
chmod +x ./Ascend-cann-kernels-{soc}_{version}_linux.run
# 校验软件包安装文件的一致性和完整性
./Ascend-cann-toolkit_{version}_linux-{arch}.run --check
./Ascend-cann-kernels-{soc}_{version}_linux.run --check
# 安装
./Ascend-cann-toolkit_{version}_linux-{arch}.run --install
./Ascend-cann-kernels-{soc}_{version}_linux.run --install
# 设置环境变量
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh# 增加软件包可执行权限,{version}表示软件版本号,{arch}表示CPU架构。
chmod +x ./Ascend-mindie_${version}_linux-${arch}.run
./Ascend-mindie_${version}_linux-${arch}.run --check
# 方式一:默认路径安装
./Ascend-mindie_${version}_linux-${arch}.run --install
# 设置环境变量
cd /usr/local/Ascend/mindie && source set_env.sh
# 方式二:指定路径安装
./Ascend-mindie_${version}_linux-${arch}.run --install-path=${AieInstallPath}
# 设置环境变量
cd ${AieInstallPath}/mindie && source set_env.sh下载 pytorch_v{pytorchversion}_py{pythonversion}.tar.gz
tar -xzvf pytorch_v{pytorchversion}_py{pythonversion}.tar.gz
# 解压后,会有whl包
pip install torch_npu-{pytorchversion}.xxxx.{arch}.whl# 若环境镜像中没有gcc、g++,请用户自行安装
yum install gcc
yum install g++
# 导入头文件路径
export CPLUS_INCLUDE_PATH=/usr/include/c++/12/:/usr/include/c++/12/aarch64-openEuler-linux/:$CPLUS_INCLUDE_PATH注:若使用openeuler镜像,需要配置gcc、g++环境,否则会导致fatal error: 'stdio.h' file not found
| 模型 | 链接 |
|---|---|
| Wan2.2-T2V-A14B | 🤗huggingface |
| Wan2.2-I2V-A14B | 🤗huggingface |
| Wan2.2-TI2V-5B | 🤗huggingface |
| 模型 | 支持分辨率 |
|---|---|
| Wan2.2-T2V-A14B | 720*1280, 1280*720, 480*832, 832*480 |
| Wan2.2-I2V-A14B | 720*1280, 1280*720, 480*832, 832*480 |
| Wan2.2-TI2V-5B | 704*1280, 1280*704 |
git clone https://modelers.cn/MindIE/Wan2.2.git
cd Wan2.2
pip3 install -r requirements.txt使用上一步下载的权重
model_base="./Wan2.2-T2V-A14B/"执行命令:
export ALGO=1
export PYTORCH_NPU_ALLOC_CONF='expandable_segments:True'
export TASK_QUEUE_ENABLE=2
export CPU_AFFINITY_CONF=1
export TOKENIZERS_PARALLELISM=false
export FAST_LAYERNORM=1
torchrun --nproc_per_node=8 --master_port=23459 generate.py \
--task t2v-A14B \
--ckpt_dir ${model_base} \
--size 1280*720 \
--frame_num 81 \
--sample_steps 40 \
--dit_fsdp \
--t5_fsdp \
--cfg_size 2 \
--ulysses_size 4 \
--vae_parallel \
--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
--base_seed 0
参数说明:
推荐配置:
torchrun --nproc_per_node=16 generate.py \
--cfg_size 2 \
--ulysses_size 8 \执行命令:
export PYTORCH_NPU_ALLOC_CONF='expandable_segments:True'
export TASK_QUEUE_ENABLE=2
export CPU_AFFINITY_CONF=1
export TOKENIZERS_PARALLELISM=false
export FAST_LAYERNORM=1
torchrun --nproc_per_node=8 --master_port=23459 generate.py \
--task t2v-A14B \
--ckpt_dir ${model_base} \
--size 1280*720 \
--frame_num 81 \
--sample_steps 40 \
--dit_fsdp \
--t5_fsdp \
--cfg_size 2 \
--ulysses_size 4 \
--vae_parallel \
--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
--use_rainfusion \
--rainfusion_type "v2" \
--sparsity 0.8 \
--sparse_start_step 15 \
--base_seed 0参数说明:
使用上一步下载的权重
model_base="./Wan2.2-I2V-A14B/"执行命令:
export ALGO=1
export PYTORCH_NPU_ALLOC_CONF='expandable_segments:True'
export TASK_QUEUE_ENABLE=2
export CPU_AFFINITY_CONF=1
export TOKENIZERS_PARALLELISM=false
export FAST_LAYERNORM=1
torchrun --nproc_per_node=8 generate.py \
--task i2v-A14B \
--ckpt_dir ${model_base} \
--size 1280*720 \
--frame_num 81 \
--sample_steps 40 \
--dit_fsdp \
--t5_fsdp \
--cfg_size 2 \
--ulysses_size 4 \
--vae_parallel \
--image examples/i2v_input.JPG \
--prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." \
--base_seed 0参数说明:
推荐配置:
torchrun --nproc_per_node=16 generate.py \
--cfg_size 2 \
--ulysses_size 8 \执行命令:
export ALGO=1
export PYTORCH_NPU_ALLOC_CONF='expandable_segments:True'
export TASK_QUEUE_ENABLE=2
export CPU_AFFINITY_CONF=1
export TOKENIZERS_PARALLELISM=false
export FAST_LAYERNORM=1
torchrun --nproc_per_node=8 generate.py \
--task i2v-A14B \
--ckpt_dir ${model_base} \
--size 1280*720 \
--frame_num 81 \
--sample_steps 40 \
--dit_fsdp \
--t5_fsdp \
--cfg_size 2 \
--ulysses_size 4 \
--vae_parallel \
--image examples/i2v_input.JPG \
--prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." \
--use_rainfusion \
--rainfusion_type "v2" \
--sparsity 0.8 \
--sparse_start_step 15 \
--base_seed 0参数说明:
使用上一步下载的权重
model_base="./Wan2.2-TI2V-5B/"执行命令:
export ALGO=1
export PYTORCH_NPU_ALLOC_CONF='expandable_segments:True'
export TASK_QUEUE_ENABLE=2
export CPU_AFFINITY_CONF=1
export TOKENIZERS_PARALLELISM=false
export FAST_LAYERNORM=1
python generate.py \
--task ti2v-5B \
--ckpt_dir ${model_base} \
--size 1280*704 \
--frame_num 121 \
--sample_steps 50 \
--image examples/i2v_input.JPG \
--prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." \
--offload_model False \
--base_seed 0 参数说明:
执行命令:
export ALGO=1
export PYTORCH_NPU_ALLOC_CONF='expandable_segments:True'
export TASK_QUEUE_ENABLE=2
export CPU_AFFINITY_CONF=1
export TOKENIZERS_PARALLELISM=false
torchrun --nproc_per_node=8 generate.py \
--task ti2v-5B \
--ckpt_dir ${model_base} \
--size 1280*704 \
--frame_num 121 \
--sample_steps 50 \
--dit_fsdp \
--t5_fsdp \
--cfg_size 2 \
--ulysses_size 4 \
--vae_parallel \
--image examples/i2v_input.JPG \
--prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." \
--base_seed 0参数说明:
推荐配置:
torchrun --nproc_per_node=16 generate.py \
--cfg_size 2 \
--ulysses_size 8 \推荐配置:
--use_attentioncache \
--start_step 20 \
--attentioncache_interval 2 \
--end_step 47参数说明:
新增Wan2.2-T2V、Wan2.2-I2V、Wan2.2-TI2V的W8A8_dynamic量化支持,针对DiT模型进行量化,降低显存占用,提高模型推理性能
下载并安装msmodelslim工具
git clone https://gitcode.com/Ascend/msit
cd msit/msmodelslim
bash install.sh以Wan2.2-T2V-A14B模型为例,导出DiT的W8A8量化权重及描述文件
cd /path/to/Wan2.2
model_base="./Wan2.2-T2V-A14B/"
python quant_wan22.py \
--task t2v-A14B \
--ckpt_dir ${model_base} \
--quant_dit_path ./quant_w8a8_dynamic \
--quant_type W8A8 \
--is_dynamic参数说明:
执行后,quant_w8a8_dynamic目录下会生成两个文件夹:
high_noise_model
quant_model_description_w8a8_dynamic.json:量化配置描述文件quant_model_weight_w8a8_dynamic.safetensors:量化后的权重文件low_noise_model
quant_model_description_w8a8_dynamic.json:量化配置描述文件quant_model_weight_w8a8_dynamic.safetensors:量化后的权重文件以Wan2.2-T2V-A14B模型为例,执行量化推理
export ALGO=1
export PYTORCH_NPU_ALLOC_CONF='expandable_segments:True'
export TASK_QUEUE_ENABLE=2
export CPU_AFFINITY_CONF=1
export TOKENIZERS_PARALLELISM=false
export FAST_LAYERNORM=1
model_base="./Wan2.2-T2V-A14B/"
quant_dit_path="./quant_w8a8_dynamic/"
torchrun --nproc_per_node=8 --master_port=23459 generate.py \
--task t2v-A14B \
--ckpt_dir ${model_base} \
--quant_dit_path ${quant_dit_path} \
--size 1280*720 \
--frame_num 81 \
--sample_steps 40 \
--dit_fsdp \
--t5_fsdp \
--cfg_size 1 \
--ulysses_size 8 \
--vae_parallel \
--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
--base_seed 0参数说明:
下载并安装msmodelslim工具
# 1. git clone msmodelslim 代码
git clone https://gitcode.com/Ascend/msmodelslim.git
# 2. 进入到 msmodelslim 的目录并运行安装脚本
cd msmodelslim
bash install.sh以Wan2.2-T2V-A14B模型为例,导出DiT的W8A8 MXFP8量化权重及描述文件
cd /path/to/Wan2.2
# 设置权重路径
model_base="./Wan2.2-T2V-A14B/"
# 设置权重保存路径
save_path="./quant_w8a8_mxfp8"
msmodelslim quant \
--model_path ${model_base} \
--save_path ${save_path} \
--device npu \
--model_type Wan2.2 \
--trust_remote_code True \
--config_path msmodelslim/lab_practice/wan2_2/wan2_2_w8a8_mxfp8_t2v.yaml参数说明:
执行后,"./quant_w8a8_mxfp8"目录下会生成两个文件夹:
t2v_high_noise_model
quant_model_description_w8a8_mxfp8.json:量化配置描述文件quant_model_weight_w8a8_mxfp8.safetensors:量化后的权重文件t2v_low_noise_model
quant_model_description_w8a8_mxfp8.json:量化配置描述文件quant_model_weight_w8a8_mxfp8.safetensors:量化后的权重文件按照模型导入路径修改文件夹命名:
cd ./quant_w8a8_mxfp8
mv t2v_high_noise_model high_noise_model
mv t2v_low_noise_model low_noise_model以Wan2.2-T2V-A14B模型为例,执行量化推理
export ALGO=0
export PYTORCH_NPU_ALLOC_CONF='expandable_segments:True'
export TASK_QUEUE_ENABLE=2
export CPU_AFFINITY_CONF=1
export TOKENIZERS_PARALLELISM=false
export FAST_LAYERNORM=1
model_base="./Wan2.2-T2V-A14B/"
quant_dit_path="./quant_w8a8_mxfp8/"
torchrun --nproc_per_node=8 --master_port=23459 generate.py \
--task t2v-A14B \
--ckpt_dir ${model_base} \
--quant_dit_path ${quant_dit_path} \
--size 1280*720 \
--frame_num 81 \
--sample_steps 40 \
--cfg_size 1 \
--ulysses_size 8 \
--vae_parallel \
--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
--base_seed 0下载并安装msmodelslim工具
# 1. git clone msmodelslim 代码
git clone https://gitcode.com/Ascend/msmodelslim.git
# 2. 进入到 msmodelslim 的目录并运行安装脚本
cd msmodelslim
bash install.sh以Wan2.2-T2V-A14B模型为例,导出DiT的W8A8 MXFP8 + attention FP8量化权重及描述文件
cd /path/to/Wan2.2
# 设置权重路径
model_base="./Wan2.2-T2V-A14B/"
# 设置权重保存路径
save_path="./quant_w8a8c8_mxfp8"
msmodelslim quant \
--model_path ${model_base} \
--save_path ${save_path} \
--device npu \
--model_type Wan2.2 \
--trust_remote_code True \
--config_path msmodelslim/lab_practice/wan2_2/wan2_2_w8a8c8_mxfp8_t2v.yaml参数说明:
执行后,"./quant_w8a8_mxfp8"目录下会生成两个文件夹:
t2v_high_noise_model
quant_model_description_w8a8_mxfp8.json:量化配置描述文件quant_model_weight_w8a8_mxfp8.safetensors:量化后的权重文件t2v_low_noise_model
quant_model_description_w8a8_mxfp8.json:量化配置描述文件quant_model_weight_w8a8_mxfp8.safetensors:量化后的权重文件按照模型导入路径修改文件夹命名:
cd ./quant_w8a8_mxfp8
mv t2v_high_noise_model high_noise_model
mv t2v_low_noise_model low_noise_model以Wan2.2-T2V-A14B模型为例,执行量化推理
export ALGO=3
export PYTORCH_NPU_ALLOC_CONF='expandable_segments:True'
export TASK_QUEUE_ENABLE=2
export CPU_AFFINITY_CONF=1
export TOKENIZERS_PARALLELISM=false
export FAST_LAYERNORM=1
model_base="./Wan2.2-T2V-A14B/"
quant_dit_path="./quant_w8a8c8_mxfp8/"
torchrun --nproc_per_node=8 --master_port=23459 generate.py \
--task t2v-A14B \
--ckpt_dir ${model_base} \
--quant_dit_path ${quant_dit_path} \
--size 1280*720 \
--frame_num 81 \
--sample_steps 40 \
--cfg_size 1 \
--ulysses_size 8 \
--vae_parallel \
--prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
--base_seed 0参数说明:
| 模型 | 分辨率 | 帧数 | 迭代次数 | 卡数 | E2E耗时 |
|---|---|---|---|---|---|
| Wan2.2-T2V-A14B | 1280×720 | 81 | 40 | 8 | 435.99s |
| Wan2.2-I2V-A14B | 1280×720 | 81 | 40 | 8 | 436.42s |
| Wan2.2-TI2V-5B | 1280×704 | 121 | 50 | 8 | 72.21s |
export T5_LOAD_CPU=1,以降低显存占用Directory operation failed. Reason: Directory [/usr/local/Ascend/mindie/latest/mindie-rt/aoe] does not exist,请设置环境变量unset TUNE_BANK_PATHfatal error: 'stdio.h' file not found,请参考1.6 gcc、g++安装Failed to bind the IP port. Reason: The IP address and port have been bound already.
HCCL function error :HcclGetRootInfo(&hcclID), error code is 7: 请配置export HCCL_HOST_SOCKET_PORT_RANGE="auto"不指定端口
HCCL function error :HcclGetRootInfo(&hcclID), error code is 11: 请配置sysctl -w net.ipv4.ip_local_reserved_ports=60000-60015预留端口