表 1 版本配套表
| 配套 | 版本 | 环境准备指导 |
|---|---|---|
| Python | 3.10/3.11 | - |
| torch | 2.9.0 | - |
注意:
# 增加软件包可执行权限,{version}表示软件版本号,{arch}表示CPU架构,{soc}表示昇腾AI处理器的版本。
chmod +x ./Ascend-cann-toolkit_{version}_linux-{arch}.run
chmod +x ./Ascend-cann-kernels-{soc}_{version}_linux.run
# 校验软件包安装文件的一致性和完整性
./Ascend-cann-toolkit_{version}_linux-{arch}.run --check
./Ascend-cann-kernels-{soc}_{version}_linux.run --check
# 安装
./Ascend-cann-toolkit_{version}_linux-{arch}.run --install
./Ascend-cann-kernels-{soc}_{version}_linux.run --install
# 设置环境变量
source /usr/local/Ascend/ascend-toolkit/set_env.sh# 增加软件包可执行权限,{version}表示软件版本号,{arch}表示CPU架构。
chmod +x ./Ascend-mindie_${version}_linux-${arch}.run
./Ascend-mindie_${version}_linux-${arch}.run --check
# 方式一:默认路径安装
./Ascend-mindie_${version}_linux-${arch}.run --install
# 设置环境变量
cd /usr/local/Ascend/mindie && source set_env.sh
# 方式二:指定路径安装
./Ascend-mindie_${version}_linux-${arch}.run --install-path=${AieInstallPath}
# 设置环境变量
cd ${AieInstallPath}/mindie && source set_env.sh下载 pytorch_v{pytorchversion}_py{pythonversion}.tar.gz
tar -xzvf pytorch_v{pytorchversion}_py{pythonversion}.tar.gz
# 解压后,会有whl包
pip install torch_npu-{pytorchversion}.xxxx.{arch}.whl https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers https://huggingface.co/openai/clip-vit-large-patch14 https://huggingface.co/tencent/HunyuanVideoHunyuanVideo
├──README.md
├──hunyuan-video-t2v-720p
│ ├──transformers
│ ├──vae
├──llava-llama-3-8b-v1_1-transformers
├──clip-vit-large-patch14当前支持的分辨率:
| 分辨率 | h/w=9:16 | h/w=9:16 | h/w=4:3 | h/w=3:4 | h/w=1:1 |
|---|---|---|---|---|---|
| 720P | 720x1280 | 1280x720 | 1104x832 | 832x1104 | 960x960 |
当前支持的卡数:1、2、3、4、6、8、16
git clone https://modelers.cn/MindIE/hunyuan_video.git
cd hunyuan_videopython hyvideo/utils/preprocess_text_encoder_tokenizer_utils.py --input_dir llava-llama-3-8b-v1_1-transformers --output_dir text_encoder修改之后的权重目录如下所示:
HunyuanVideo
├──README.md
├──hunyuan-video-t2v-720p
│ ├──transformers
│ ├──vae
├──text_encoder
├──clip-vit-large-patch14pip3 install -r requirements.txt执行命令:
export TOKENIZERS_PARALLELISM=false
export ALGO=0
python sample_video.py \
--model-base HunyuanVideo \
--dit-weight HunyuanVideo/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt \
--vae-path HunyuanVideo/hunyuan-video-t2v-720p/vae \
--text-encoder-path HunyuanVideo/text_encoder \
--text-encoder-2-path HunyuanVideo/clip-vit-large-patch14 \
--model-resolution "720p" \
--video-size 720 1280 \
--video-length 129 \
--infer-steps 50 \
--prompt "A cat walks on the grass, realistic style." \
--seed 42 \
--flow-reverse \
--num-videos 1 \
--device_id 0 \
--save-path ./results参数说明:
执行命令:
export TOKENIZERS_PARALLELISM=false
export ALGO=0
python sample_video.py \
--model-base HunyuanVideo \
--dit-weight HunyuanVideo/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt \
--vae-path HunyuanVideo/hunyuan-video-t2v-720p/vae \
--text-encoder-path HunyuanVideo/text_encoder \
--text-encoder-2-path HunyuanVideo/clip-vit-large-patch14 \
--model-resolution "720p" \
--video-size 720 1280 \
--video-length 129 \
--infer-steps 50 \
--prompt "A cat walks on the grass, realistic style." \
--seed 42 \
--flow-reverse \
--num-videos 1 \
--device_id 0 \
--use_cache \
--use_cache_double \
--use-cpu-offload \
--save-path ./results参数说明:
执行命令:
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
export TASK_QUEUE_ENABLE=2
export CPU_AFFINITY_CONF=1
export TOKENIZERS_PARALLELISM=false
export ALGO=0
torchrun --nproc_per_node=8 sample_video.py \
--model-base HunyuanVideo \
--dit-weight HunyuanVideo/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt \
--vae-path HunyuanVideo/hunyuan-video-t2v-720p/vae \
--text-encoder-path HunyuanVideo/text_encoder \
--text-encoder-2-path HunyuanVideo/clip-vit-large-patch14 \
--model-resolution "720p" \
--video-size 720 1280 \
--video-length 129 \
--infer-steps 50 \
--prompt "A cat walks on the grass, realistic style." \
--seed 42 \
--flow-reverse \
--ulysses-degree 8 \
--ring-degree 1 \
--vae-parallel \
--num-videos 1 \
--save-path ./results参数说明:
一、使用attentioncache 执行命令:
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
export TASK_QUEUE_ENABLE=2
export CPU_AFFINITY_CONF=1
export TOKENIZERS_PARALLELISM=false
export ALGO=0
torchrun --nproc_per_node=8 sample_video.py \
--model-base HunyuanVideo \
--dit-weight HunyuanVideo/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt \
--vae-path HunyuanVideo/hunyuan-video-t2v-720p/vae \
--text-encoder-path HunyuanVideo/text_encoder \
--text-encoder-2-path HunyuanVideo/clip-vit-large-patch14 \
--model-resolution "720p" \
--video-size 720 1280 \
--video-length 129 \
--infer-steps 50 \
--prompt "A cat walks on the grass, realistic style." \
--seed 42 \
--flow-reverse \
--ulysses-degree 8 \
--ring-degree 1 \
--vae-parallel \
--use_attentioncache \
--num-videos 1 \
--save-path ./results参数说明:
执行命令:
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
export TASK_QUEUE_ENABLE=2
export CPU_AFFINITY_CONF=1
export TOKENIZERS_PARALLELISM=false
export ALGO=0
torchrun --nproc_per_node=16 sample_video.py \
--model-base HunyuanVideo \
--dit-weight HunyuanVideo/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt \
--vae-path HunyuanVideo/hunyuan-video-t2v-720p/vae \
--text-encoder-path HunyuanVideo/text_encoder \
--text-encoder-2-path HunyuanVideo/clip-vit-large-patch14 \
--model-resolution "720p" \
--video-size 720 1280 \
--video-length 129 \
--infer-steps 50 \
--prompt "A cat walks on the grass, realistic style." \
--seed 42 \
--flow-reverse \
--ulysses-degree 8 \
--ring-degree 2 \
--vae-parallel \
--save-path ./results参数说明:
执行命令:
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
export TASK_QUEUE_ENABLE=2
export CPU_AFFINITY_CONF=1
export TOKENIZERS_PARALLELISM=false
export ALGO=0
torchrun --nproc_per_node=16 sample_video.py \
--model-base HunyuanVideo \
--dit-weight HunyuanVideo/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt \
--vae-path HunyuanVideo/hunyuan-video-t2v-720p/vae \
--text-encoder-path HunyuanVideo/text_encoder \
--text-encoder-2-path HunyuanVideo/clip-vit-large-patch14 \
--model-resolution "720p" \
--video-size 720 1280 \
--video-length 129 \
--infer-steps 50 \
--prompt "A cat walks on the grass, realistic style." \
--seed 42 \
--flow-reverse \
--ulysses-degree 8 \
--ring-degree 2 \
--vae-parallel \
--use_attentioncache \
--save-path ./results参数说明:
本项目新增量化功能,支持权重 8 位(w8)与激活 8 位 / 16 位(a8/a16)的量化组合,可减少模型显存占用并保持推理性能
参考官方README
| 参数 | 含义 | 可选值 | 默认值 |
|---|---|---|---|
| --quant_save_dir | 量化模型保存目录 | - | ./hunyuan_quant_weights |
| --quant_mode | 量化模式 | w8a8(权重8位+激活8位)、w8a16(权重8位+激活16位) | w8a8 |
| --is_dynamic | 是否启用动态量化(激活值动态计算量化参数) | (默认False,加此参数表示启用) | |
| --w_sym | 权重是否使用对称量化 | (默认False,加此参数表示启用) | |
| --act_method | 激活量化方法(Label-Free场景) | 1(min-max)、2(histogram)、3(auto-mixed,LLM推荐) | 3 |
| --disable_quant_layers | 不量化的层名称列表 | - | ["time_in.mlp.0", "time_in.mlp.2"] |
export TOKENIZERS_PARALLELISM=false
export ALGO=0
python quantization/quant.py \
--model-base HunyuanVideo \
--dit-weight HunyuanVideo/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt \
--vae-path HunyuanVideo/hunyuan-video-t2v-720p/vae \
--text-encoder-path HunyuanVideo/text_encoder \
--text-encoder-2-path HunyuanVideo/clip-vit-large-patch14 \
--quant_save_dir ./quant_w8a8_dynamic \
--quant_mode w8a8 \
--is_dynamic \
--w_sym \
--device_id 0 # 指定0卡
执行后,quant_w8a8_dynamic目录下会生成两个文件:
quant_model_description_w8a8_dynamic.json:量化配置描述文件(包含量化位宽、层映射等元信息)quant_model_weight_w8a8_dynamic.safetensors:量化后的权重文件(采用safe tensor格式,兼容Hugging Face生态)# 增加软件包可执行权限,{version}表示软件版本号,{arch}表示CPU架构。
chmod +x Ascend-cann-nnal_<version>_linux-<arch>.run
# 默认路径安装:
./Ascend-cann-nnal_<version>_linux-<arch>.run --install --torch_atb
# 配置环境变量:
source ${HOME}/Ascend/nnal/atb/set_env.sh使用量化模型进行推理时,需在原有sample_video.py命令中添加--quant_desc_path参数,指向量化描述文件(quant_model_description_*.json)路径,该路径需要是绝对路径,其余参数与原生模型推理一致。
export TOKENIZERS_PARALLELISM=false
export ALGO=0
export file_absolute_path="your local quant description file absolute path"
python sample_video.py \
--model-base HunyuanVideo \
--dit-weight HunyuanVideo/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt \
--vae-path HunyuanVideo/hunyuan-video-t2v-720p/vae \
--text-encoder-path HunyuanVideo/text_encoder \
--text-encoder-2-path HunyuanVideo/clip-vit-large-patch14 \
--model-resolution "720p" \
--video-size 720 1280 \
--video-length 129 \
--infer-steps 50 \
--prompt "A cat walks on the grass, realistic style." \
--seed 42 \
--flow-reverse \
--num-videos 1 \
--device_id 0 \
--save-path ./t2v_w8a8_dynamic_results
--quant_desc_path ${file_absolute_path}export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
export TASK_QUEUE_ENABLE=2
export CPU_AFFINITY_CONF=1
export TOKENIZERS_PARALLELISM=false
export ALGO=0
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
torchrun --nproc_per_node=8 --master-port 29503 sample_video.py \
--model-base /HunyuanVideo \
--dit-weight HunyuanVideo/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt \
--vae-path HunyuanVideo/hunyuan-video-t2v-720p/vae \
--text-encoder-path HunyuanVideo/text_encoder \
--text-encoder-2-path HunyuanVideo/clip-vit-large-patch14 \
--model-resolution "720p" \
--video-size 720 1280 \
--video-length 81 \
--infer-steps 50 \
--prompt "A cat walks on the grass, realistic style." \
--seed 42 \
--flow-reverse \
--ulysses-degree 8 \
--ring-degree 1 \
--vae-parallel \
--num-videos 1 \
--save-path ./t2v_w8a8_dynamic_results \
--quant_desc_path ${file_absolute_path}--quant_desc_path:量化描述文件路径(指定该参数后自动启用量化推理)--is_dynamic True)适用于激活值分布波动较大的场景,静态量化(--is_dynamic False)需提前通过校准数据计算激活范围。--disable_quant_layers参数指定不量化的层(如对精度敏感的时间嵌入层)。我们使用prompts.txt测试了seed42-46五组种子的视频,并测试了vbench并取平均值,6个指标如下:
| 分辨率h*w | dynamic_degree | subject_consistency | imaging_quality | aesthetic_quality | overall_consistency | motion_smoothness |
|---|---|---|---|---|---|---|
| 720*1280 | 0.1516 | 0.9774 | 0.5283 | 0.6048 | 0.291 | 0.9931 |
注:量化模型的精度指标可参考上述数据,实际偏差在±5%以内。
当前支持的并行参数:
| --video-size | --ulysses-degree x --ring-degree | --nproc_per_node | --video-length |
|---|---|---|---|
| 720 1280 or 1280 720 | 8x2 | 16 | 129 |
| 1104 832 or 832 1104 | 8x2 | 16 | 129 |
| 960 960 | 8x2 | 16 | 129 |
| 720 1280 or 1280 720 | 8x1 | 8 | 129 |
| 1104 832 or 832 1104 | 8x1 | 8 | 129 |
| 960 960 | 8x1 | 8 | 129 |
| 720 1280 or 1280 720 | 6x1 | 6 | 129 |
| 1104 832 or 832 1104 | 6x1 | 6 | 129 |
| 960 960 | 6x1 | 6 | 129 |
| 720 1280 or 1280 720 | 4x1 | 4 | 129 |
| 1104 832 or 832 1104 | 4x1 | 4 | 129 |
| 960 960 | 4x1 | 4 | 129 |
| 720 1280 or 1280 720 | 3x1 | 3 | 129 |
| 1104 832 or 832 1104 | 3x1 | 3 | 129 |
| 960 960 | 3x1 | 3 | 129 |
| 720 1280 or 1280 720 | 2x1 | 2 | 129 |
| 1104 832 or 832 1104 | 2x1 | 2 | 129 |
| 960 960 | 2x1 | 2 | 129 |
| 并行度 | 参数配置 |
|---|---|
| 2 | --use_cache --use_cache_double |
| 3 | --use_cache --use_cache_double |
| 4 | --use_cache --use_cache_double |
| 8 | --use_attentioncache |
| 16 | --use_attentioncache |
quant_model_description.json与quant_model_weight.safetensors文件路径对应,且文件名一致(仅后缀不同)。本模型使用的优化手段如下: