Ascend-SACT/CosyVoice2-Triton
模型介绍文件和版本Pull Requests讨论分析
下载使用量0

1、模型概述

CosyVoice2是通义实验室推出的一款先进的语音合成模型,专注于实现低延迟、高质量的流式语音生成,并支持多语言、零样本语音克隆及精细的情感控制。

本文主要基于Triton Inference Server进行CosyVoice2模型的服务化部署,Triton Inference Server是一款功能全面的开源推理服务化软件,具备多框架模型支持、并发执行与动态批处理能力,并提供了基于Python的自定义后端开发与模型组合等功能。

  • 参考文档:基于Triton Inference Server的昇腾小模型服务化部署参考实践

模型主要特点

多语言支持

  • 支持语言:中文、英文、日文、韩文、中文方言(粤语、四川话、上海话、天津话、武汉话等)
  • 跨语言与混合语言:支持零样本音色克隆,适用于跨语言和语码切换场景

超低时延

  • 双向流式支持:CosyVoice 2.0 融合离线和流式建模技术
  • 首包快速合成:在保持高质量音频输出的同时,实现低至150ms的时延

高准确率

  • 发音改进:较 CosyVoice 1.0,发音错误减少30%至50%
  • 评测成绩:在Seed-TTS评测集困难测试集上取得最低字错误率

强稳定性

  • 音色一致性:为零样本和跨语言语音合成提供可靠的音色稳定性
  • 跨语言合成:较1.0版本有显著提升

自然体验

  • 韵律与音质提升:合成音频对齐度改善,MOS评测得分从5.4提升至5.53
  • 情感与方言灵活性:现支持更细粒度的情感控制和口音调整

Triton Inference Server介绍

Triton Inference Server支持多种推理后端,包括专用后端(如ONNX、TensorFlow)和灵活的Python后端,基于昇腾设备上部署可通过Python后端来实现

Triton Inference Server的模型仓库是一个目录结构化的模型管理系统,它定义了模型在服务中的存放路径(包含多个模型)、模型配置(输入/输出、batch、并发等)和版本控制(多个模型版本)。

模型仓库的整体配置如下:

model_repository
└── model_name
    ├── 1 # 表示版本信息,可以同时管理多个版本
        └── model.py # 服务后端推理代码
    └── config.pbtxt # 配置模型的参数,如模型名称、执行后端、输入输出等

2、镜像部署

本项目已经打包了镜像文件cosyvoice2-triton.tar.gz,可直接下载镜像进行部署使用,详细部署方式见 3、详细部署步骤 及后续内容

2.1、导入镜像

docker load < cosyvoice2-triton.tar.gz

2.2、构建容器

docker run -itd --privileged --name=triton_server_cosyvoice2 --net=host --shm-size=500g \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci2 \
--device=/dev/davinci3 \
--device=/dev/davinci4 \
--device=/dev/davinci5 \
--device=/dev/davinci6 \
--device=/dev/davinci7 \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/sbin/:/usr/local/sbin/ \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-e LD_LIBRARY_PATH=/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:$LD_LIBRARY_PATH \
cosyvoice2-tritonserver:24.10-8.3.rc2 /bin/bash

目录结构如下:

|-- model_repository
|   |-- cosyvoice
|       |-- 1
|       |   |-- CosyVoice2 # ModelZoo仓库目录
|       |   |   |-- 300I
|       |   |   |   |-- diff_CosyVoice_300I.patch
|       |   |   |   |-- modeling_qwen2.py
|       |   |   |-- 800I
|       |   |   |   |-- diff_CosyVoice_800I.patch
|       |   |   |   |-- modeling_qwen2.py
|       |   |   |-- CosyVoice # CosyVoice仓库目录
|       |   |   |   |-- cosyvoice
|       |   |   |   |-- transformers
|       |   |   |   |-- infer.py
|       |   |   |-- modify_onnx.py
|       |   |   |-- requirements.txt
|       |   |-- model.py # 服务后端推理代码
|       |-- client.py # 客户端请求测试脚本
|       |-- config.pbtxt # 配置模型的参数,如模型名称、执行后端、输入输出等

2.3、模型服务化启动

# 相关环境变量导入
source /usr/local/Ascend/ascend-toolkit/set_env.sh
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
export PYTHONPATH=$PYTHONPATH:/opt/tritonserver/model_repository/cosyvoice/1/CosyVoice2/CosyVoice/third_party/Matcha-TTS
export PYTHONPATH=$PYTHONPATH:/opt/tritonserver/model_repository/cosyvoice/1/CosyVoice2/CosyVoice/transformers/src
export PYTHONPATH=$PYTHONPATH:/opt/tritonserver/model_repository/cosyvoice/1/CosyVoice2/CosyVoice
export PYTHONIOENCODING=utf-8

# triton服务化启动
tritonserver --model-repo=./model_repository/ --http-address=0.0.0.0 --http-port=8989

2.4、客户端测试

# client脚本执行生成的语音会保存在sft_result_triton.wav文件
# 首次推理由于执行编译,时间较长,首次编译后,后续推理无需重复编译
python3 model_repository/cosyvoice/client.py

FAQ

  • 服务启动时报错“AttributeError: 'ClassDef' object has no attribute 'type_params'”
    ModelScope项目对于python3.11版本及以下存在版本检查的问题,需要删除/root/.cache/modelscope/ast_indexer文件,清除ast缓存

3、详细部署步骤

3.1、环境版本

配套版本环境准备指导
设备型号Atlas 800T A2 910B\
固件与驱动25.2.0Pytorch框架推理环境准备
Triton镜像24.10Triton Inference Server | NVIDIA NGC
CANN8.3.RC2包含kernels包和toolkit包
Python3.10-
PyTorch2.3.1-
Ascend Extension PyTorch2.3.1.post6-

3.2、拉取镜像

  • 拉取Triton-Inference-Server官方镜像
docker pull nvcr.io/nvidia/tritonserver:24.10-py3

3.3、构建容器

docker run -itd --privileged --name=triton_server_cosyvoice2_24.10 --net=host --shm-size=500g \
    --device=/dev/davinci0 \
    --device=/dev/davinci1 \
    --device=/dev/davinci2 \
    --device=/dev/davinci3 \
    --device=/dev/davinci4 \
    --device=/dev/davinci5 \
    --device=/dev/davinci6 \
    --device=/dev/davinci7 \
    --device=/dev/davinci_manager \
    --device=/dev/devmm_svm \
    --device=/dev/hisi_hdc \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
    -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/sbin/:/usr/local/sbin/ \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -e LD_LIBRARY_PATH=/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:$LD_LIBRARY_PATH \
    nvcr.io/nvidia/tritonserver:24.10-py3 /bin/bash
  • 进入容器
docker exec -it triton_server_cosyvoice2_24.10 /bin/bash
  • 设置环境变量,否则下次进入容器NPU卡无法显示
echo 'LD_LIBRARY_PATH=/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:$LD_LIBRARY_PATH' >> ~/.bashrc

3.4、CANN软件包部署

  • 下载CANN软件包:资源下载中心-昇腾社区
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/CANN%208.3.RC1/Ascend-cann-kernels-910b_8.3.RC1_linux-aarch64.run?response-content-type=application/octet-stream -O Ascend-cann-kernels-910b_8.3.RC1_linux-aarch64.run
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/CANN%208.3.RC1/Ascend-cann-toolkit_8.3.RC1_linux-aarch64.run?response-content-type=application/octet-stream -O Ascend-cann-toolkit_8.3.RC1_linux-aarch64.run
  • 安装CANN软件包
chmod +x Ascend-cann-*
./Ascend-cann-toolkit_8.3.RC1_linux-aarch64.run --install
./Ascend-cann-kernels-310p_8.3.RC1_linux-aarch64.run --install

4、推理服务部署

4.1、获取源码

  1. 拉取ModelZoo仓库源码

    git clone https://gitee.com/ascend/ModelZoo-PyTorch.git
    
    # 将ModelZoo-PyTorch仓库目录记为 ${ModelZoo-PyTorch}
  2. 拉取本仓源码

    git clone https://atomgit.com/Ascend-SACT/CosyVoice2-Triton.git
    
    # 将CosyVoice2-Triton仓库目录记为 ${CosyVoice2-Triton}
  3. 将CosyVoice模型目录放入版本文件夹下

    cd ${CosyVoice2-Triton}/model_repository/1
    cp -r ${ModelZoo-PyTorch}/ACL_PyTorch/built-in/audio/CosyVoice2 ./
  4. 目录结构如下

    CosyVoice2-Triton/
    |-- model_repository
    |   |-- cosyvoice
    |       |-- 1
    |       |   |-- CosyVoice2
    |       |   |-- model.py
    |       |-- client.py
    |       |-- config.pbtxt

3.2、模型迁移

模型迁移参考:CosyVoice(TorchAir)-推理指导,涉及主要步骤如下

  1. 部署CosyVoice与Transformer

    cd ${CosyVoice2-Triton}/model_repository/cosyvoice/1/CosyVoice2
    
    # 获取CosyVoice源码
    git clone https://github.com/FunAudioLLM/CosyVoice
    cd CosyVoice
    git reset --hard fd45708
    git submodule update --init --recursive
    
    # 根据当前使用机型,叠加patch。当前使用机型为800T A2,和800I共用patch文件
    git apply ../800I/diff_CosyVoice_800I.patch
    
    # 将infer.py复制到CosyVoice中
    cp ../infer.py ./
    cd ..
    
    # 获取Transformer源码
    git clone https://github.com/huggingface/transformers.git
    cd transformers
    git checkout v4.37.0
    
    # 将modeling_qwen模型文件替换到transformers仓内。当前使用机型为800I A2,和800I共用modeling_qwen2.py。
    cp ../800I/modeling_qwen2.py ./transformers/src/transformers/models/qwen2

    文件目录结构如下

    |-- model_repository
    |   |-- cosyvoice
    |       |-- 1
    |       |   |-- CosyVoice2
    |       |   |   |-- 300I
    |       |   |   |   |-- diff_CosyVoice_300I.patch
    |       |   |   |   |-- modeling_qwen2.py
    |       |   |   |-- 800I
    |       |   |   |   |-- diff_CosyVoice_800I.patch
    |       |   |   |   |-- modeling_qwen2.py
    |       |   |   |-- CosyVoice
    |       |   |   |   |-- cosyvoice
    |       |   |   |   |-- transformers
    |       |   |   |   |-- infer.py
    |       |   |   |-- modify_onnx.py
    |       |   |   |-- requirements.txt
    |       |   |-- model.py
    |       |-- client.py
    |       |-- config.pbtxt
  2. 依赖安装

    apt-get update
    apt-get install sox git-lfs
    
    pip3 install tokenizers==0.15.1
    pip3 install "ruamel.yaml<0.17"
    
    # 手动编译安装openfst,否则WeTextProcessing安装会有报错
    # 下载安装包并解压
    wget https://www.openfst.org/twiki/pub/FST/FstDownload/openfst-1.8.3.tar.gz
    # 进入目录后编译安装
    ./configure --enable-far --enable-mpdt --enable-pdt
    make -j$(nproc)
    make install
    # 确认动态库文件存在:
    ls /usr/local/lib/libfstmpdtscript.so.26
    # 配置动态库路径
    export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
    # 安装WeTextProcessing
    pip3 install WeTextProcessing==1.0.4.1
    
    # 安装requirements
    pip3 install -r ../requirements.txt
  3. msit工具安装

    参考msit安装指南,使用源码方式安装

    # 1. git pull origin 更新最新代码 
    git clone https://gitee.com/ascend/msit.git
    cd msit/msit
    
    # 2. 安装 msit 包
    pip install .
    
    # 4. 安装benchmark和surgeon,会自动部署ais_bench
    msit install benchmark surgeon
  4. 获取权重数据

    cd ${CosyVoice2-Triton}/model_repository/cosyvoice/1/CosyVoice2/CosyVoice
    
    # 1. 克隆
    git clone https://www.modelscope.cn/iic/CosyVoice2-0.5B.git
    cd CosyVoice2-0.5B
    # 将CosyVoice2-0.5B目录记为${CosyVoice2-0.5B}
    
    # 2. 切换到目标 commit
    git checkout 9bd5b08fc085bd93d3f8edb16b67295606290350
    
    # 3. 拉取 LFS 大文件(如模型权重)
    git lfs pull
    
    # 4. 本用例采用sft预训练音色推理,需额外下载spk权重放到权重目录下
    wget https://www.modelscope.cn/models/iic/CosyVoice-300M-SFT/resolve/master/spk2info.pt
  5. 模型转换

    # 修改onnx结构
    cd ${CosyVoice2-Triton}/model_repository/cosyvoice/1/CosyVoice2/
    python3 modify_onnx.py ./CosyVoice/CosyVoice2-0.5B/
    
    # 模型转换
    source /usr/local/Ascend/ascend-toolkit/set_env.sh
    
    atc --framework=5 --soc_version=${soc_version} --model ./${CosyVoice2-0.5B}/speech_token_md.onnx --output ./${CosyVoice2-0.5B}/speech --input_shape="feats:1,128,-1;feats_length:1" --precision_mode allow_fp32_to_fp16
    atc --framework=5 --soc_version=${soc_version} --model ./${CosyVoice2-0.5B}/flow.decoder.estimator.fp32.onnx --output ./${CosyVoice2-0.5B}/flow --input_shape="x:2,80,-1;mask:2,1,-1;mu:2,80,-1;t:2;spks:2,80;cond:2,80,-1"
    atc --framework=5 --soc_version=${soc_version} --model ./${CosyVoice2-0.5B}/flow.decoder.estimator.fp32.onnx --output ./${CosyVoice2-0.5B}/flow_static --input_shape="x:2,80,-1;mask:2,1,-1;mu:2,80,-1;t:2;spks:2,80;cond:2,80,-1" --dynamic_dims="100,100,100,100;200,200,200,200;300,300,300,300;400,400,400,400;500,500,500,500;600,600,600,600;700,700,700,700" --input_format=ND
  6. 模型推理验证

    cd ${CosyVoice2-Triton}/model_repository/cosyvoice/1/CosyVoice2/CosyVoice
    # 1. 指定使用NPU ID,默认为0
    export ASCEND_RT_VISIBLE_DEVICES=0
    # 2. 设置环境变量
    export PYTHONPATH=third_party/Matcha-TTS:$PYTHONPATH
    export PYTHONPATH=transformers/src:$PYTHONPATH
    # 3. 执行推理脚本
    python3 infer.py --model_path=${CosyVoice2-0.5B} --stream_out

3.3、模型服务化

  1. Triton依赖安装

    pip3 install tritonclient pyworld gevent geventhttpclient
  2. 服务化启动

    source /usr/local/Ascend/ascend-toolkit/set_env.sh
    
    export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
    export PYTHONPATH=$PYTHONPATH:${CosyVoice2-Triton}/model_repository/cosyvoice/1/CosyVoice2/CosyVoice/third_party/Matcha-TTS
    export PYTHONPATH=$PYTHONPATH:${CosyVoice2-Triton}/model_repository/cosyvoice/1/CosyVoice2/CosyVoice/transformers/src
    export PYTHONPATH=$PYTHONPATH:${CosyVoice2-Triton}/model_repository/cosyvoice/1/CosyVoice2/CosyVoice
    export PYTHONIOENCODING=utf-8
    
    cd ${CosyVoice2-Triton}
    tritonserver --model-repo=./model_repository/ --http-address=0.0.0.0 --http-port=8989
  3. 客户端测试

    client脚本执行生成的语音会保存在sft_result_triton.wav文件
    首次推理由于执行编译,时间较长,首次编译后,后续推理无需重复编译。可以考虑在服务启动时先进行预热

    python3 model_repository/cosyvoice/client.py