YingMusic-SVC 歌声转换模型部署指南

1. 模型概述及场景

YingMusic-SVC（Singing Voice Conversion） 是一款面向推理场景的多模态模型，在音频领域主打“真实歌曲可用”的零样本歌声转换能力。通过针对真实音乐场景的全面优化，有效抑制了伴奏、和声与混响对歌声转换的干扰，显著降低了破音与高音失真的风险，为高质量的音乐再创作提供了稳定的技术支撑。

2. 准备运行环境

版本配套表

配套	版本	环境准备指导
CANN	8.2.0	-
Python	3.11.6	-
torch	2.8.0	-
torch_npu	2.8.0	-
torchaudio	2.8.0	-
torchvision	0.23.0	-
sox	1.5.0	-

1.1 环境准备

Atlas 800T A2（8×64G）
部署卡类型：910B2
部署方式：单卡
操作系统：ARM

1.2 镜像下载

地址：Ascend Hub
版本：2.1.RC2-800I-A2-py311-openeuler24.03-lts（ARM 架构）

3. 运行指导

3.1 源码

地址：Github

3.2 定制适配代码

地址：Gitcode
说明：定制适配代码下载后覆盖源码

3.3 安装依赖

(1) Python

cd YingMusic-SVC
pip install -r requirements.txt

(2) openEuler

yum install gcc g++ cmake sox ffmpeg sox-devel

3.4 模型权重和相关文件下载与存放路径

文件名	地址	存放路径
`bs_roformer.ckpt`	HF Mirror - bs_roformer	`YingMusic-SVC/accom_separation/ckpt/bs_roformer`
`YingMusic-SVC-full.pt`	HF Mirror - YingMusic-SVC-full	`YingMusic-SVC/path/to`
`rmvpe.pt`	HF Mirror - rmvpe	`YingMusic-SVC/path/to`
`campplus_cn_common.bin`	HF Mirror - campplus_cn_common	`YingMusic-SVC/path/to`
`bigvgan_generator.pt` 和 `config.json`	HF Mirror - bigvgan	`YingMusic-SVC/nvidia/bigvgan_v2_44khz_128band_512x`
Whisper Small 所有文件	HF Mirror - whisper small	`YingMusic-SVC/openai/whisper-small`

3.5 执行结果

运行推理脚本：

cd YingMusic-SVC
bash my_infer.sh

示例输出日志：

npu is available, use --force_cpu to disable it.
Using device: npu:0
SageAttention not found. Will fall back to PyTorch SDPA (if available) or manual einsum.
Using flash attention if input tensor is on npu
Start from checkpoint: ckpt/bs_roformer/bs_roformer.ckpt
Instruments: ['vocals', 'backing_vocal', 'instrumental']
Model load time: xxx
Total files found: 1. Using sample rate: 44100
Processing track: ../instrumental20260210162934/input_folder/source.mp3
Processing audio chunks: 0%| | 0/2469600 [00:00
Elapsed time: xxx seconds. 
[INFO] Inference completed successfully.
[INFO] Total time: xxx seconds.
accompany:instrumental20260210162934/store_dir/source/instrumental.wav
using npu: npu:0
Start fp16 to accelerate inference！
load model from path/to/YingMusic-SVC-full.pt
load config from ./configs/YingMusic-SVC.yml
cfm loaded
length_regulator loaded
Loading config.json from local directory
Loading weights from local directory
Removing weight norm...
test/source.mp3 test/target.mp3
It is strongly recommended to pass the `sampling_rate` argument to this function. Failing to do so can result in silent errors that might be hard to debug.
It is strongly recommended to pass the `sampling_rate` argument to this function. Failing to do so can result in silent errors that might be hard to debug.
It is strongly recommended to pass the `sampling_rate` argument to this function. Failing to do so can result in silent errors that might be hard to debug.
..F0：57.16 ~ 211.94 Hz
auto predicted pitch shift: -10.252620650824467
automatic pitch shift -12 semi tones
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:03<00:00, 26.46it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:03<00:00, 27.03it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:03<00:00, 27.00it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:03<00:00, 27.00it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:03<00:00, 27.00it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:03<00:00, 26.83it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:03<00:00, 27.03it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:03<00:00, 27.04it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:03<00:00, 27.05it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:03<00:00, 27.05it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:03<00:00, 30.88it/s]
RTF: xxx
export file:outputs/your_exp_name/accompany/target_source_-12.wav

推理完成！生成结果位于：

outputs/your_exp_name/accompany/target_source_-12.wav

4. 常见问题

4.1 切换源音频

修改 my_infer.sh 中的变量： source='prompts/syz.wav'

4.2 切换目标音频

修改 my_infer.sh 中的变量： target="prompts/tp.wav"