DiariZen 是一款由 AudioZen 与 Pyannote 3.1 驱动的说话人分轨工具包。 github中的地址为:https://github.com/BUTSpeechFIT/DiariZen
| 配套 | 版本 | 环境准备指导 |
|---|---|---|
| Python | 3.10.12 | - |
| torch | 2.5.1+cpu | - |
| torch_npu | 2.5.1 | - |
硬件设备
| 设备型号 | NPU配置 |
|---|---|
| Atlas 800I A2 910B | 1卡 |
git clone https://github.com/BUTSpeechFIT/DiariZen.git或者
git clone https://githubfast.com/BUTSpeechFIT/DiariZen.gitexport HF_ENDPOINT=https://hf-mirror.com
git clone https://huggingface.co/BUT-FIT/diarizen-wavlm-large-s80-md.gitgit submodule init
git submodule update如果下载不成功,则使用下列命令:
rm -rf dscore
git clone https://githubfast.com/nryant/dscore.git dscoreconda create --name diarizen python=3.10
conda activate diarizen如果conda软件不存在,则下载Miniconda3-py311_24.1.2-0-Linux-x86_64.sh,并安装
bash Miniconda3-py311_24.1.2-0-Linux-x86_64.shpip3 install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt && pip install -e .
pip install ml-dtypes cloudpicklepip install torch-npu==2.5.1 -i https://mirrors.huaweicloud.com/repository/pypi/simple --no-cache-dir注意,这里一定要保证torch-npu与torch版本匹配,否则后面就容易出现各种错误;
(diarizen) [root:DiariZen]$ pip list | grep torch
torch 2.5.1+cpu
torch-npu 2.5.1
torchaudio 2.5.1+cpu
torchinfo 1.8.0
torchvision 0.20.1+cpu1,安装缺省的pyannote.audio=3.1.1版本
cd pyannote-audio && pip install -e .[dev,testing]
cd ../2,下载除 pyannote.audio 外的其他 pyannote 软件(与 pyannote.audio=3.1.1 配套的版本)
wget https://githubfast.com/pyannote/pyannote-core/archive/refs/tags/5.0.0.tar.gz
wget https://githubfast.com/pyannote/pyannote-database/archive/refs/tags/5.0.1.tar.gz
wget https://githubfast.com/pyannote/pyannote-metrics/archive/refs/tags/3.2.tar.gz
wget https://githubfast.com/pyannote/pyannote-pipeline/archive/refs/tags/3.0.1.tar.gz3,修改名称并解压文件:
mv 5.0.0.tar.gz pyannote-core5.0.0.tar.gz
mv 5.0.1.tar.gz pyannote-database5.0.1.tar.gz
mv 3.2.tar.gz pyannote-metrics3.2.tar.gz
mv 3.0.1.tar.gz pyannote.pipeline3.0.1.tar.gz
tar -zxvf pyannote-core5.0.0.tar.gz
tar -zxvf pyannote-database5.0.1.tar.gz
tar -zxvf pyannote-metrics3.2.tar.gz
tar -zxvf pyannote.pipeline3.0.1.tar.gz
4,安装其他 pyannote 软件
pip uninstall pyannote-database pyannote-metrics pyannote-pipeline pyannote-core
cd pyannote-metrics-3.2 && pip install -e .
cd ../pyannote-pipeline-3.0.1 && pip install -e .
cd ../pyannote-database-5.0.1 && pip install -e .
cd ../pyannote-core-5.0.0 && pip install -e .
cd ../
说明:如果不安装pyannote-core、pyannote-database等软件,推理运行会出现错误;
apt update && apt install ffmpegpip install numpy==1.26.4文件 "/root/miniconda3/envs/diarizen/lib/python3.10/site-packages/torchaudio/compliance/kaldi.py",第 616 行,修改为:
#spectrum = torch.fft.rfft(strided_input).abs()
c = torch.fft.rfft(strided_input)
spectrum = torch.hypot(c.real, c.imag)不修改会出现torch.fft.rfft(strided_input).abs()不支持DT_COMPLEX64的错误
回到目录(diarizen) [root:DiariZen]下,参考https://huggingface.co/BUT-FIT/diarizen-wavlm-large-s80-md使用说明,创建infer.py进行推理
from diarizen.pipelines.inference import DiariZenPipeline
# load pre-trained model
diar_pipeline = DiariZenPipeline.from_pretrained("BUT-FIT/diarizen-wavlm-large-s80-md")
# apply diarization pipeline
diar_results = diar_pipeline('./example/EN2002a_30s.wav')
# print results
for turn, _, speaker in diar_results.itertracks(yield_label=True):
print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# load pre-trained model and save RTTM result
diar_pipeline = DiariZenPipeline.from_pretrained(
"BUT-FIT/diarizen-wavlm-large-s80-md",
rttm_out_dir='.'
)
# apply diarization pipeline
diar_results = diar_pipeline('./example/EN2002a_30s.wav', sess_name='session_name')export HF_ENDPOINT=https://hf-mirror.comln -s /inspire/sj-ssd/project/embodied-multimodality-ascend/public/xxx/DiariZen/but-fit/models--but-fit--diarizen-wavlm-large-s80-md ~/.cache/huggingface/hub/models--but-fit-- diarizen-wavlm-large-s80-md修改方式1:安装numpy==1.26.4版本
pip install ==1.26.4修改方式2: 在对应的安装目录pyannote/audio/pipelines/speaker_diarization.py 和inference.py中,修改为:
np_version=version.parse(np.__version__)
if np_version >= version.parse("2.0"):
np_value = np.nan
else:
np_value = np.NaN
...用np_value替换np.nan。