DiarizationLM-8b-Fisher-v2:可用于优化说话人分角色转录结果，提升对话中转录准确性。该项目是在Fisher语料库上微调的DiarizationLM模型，基于unsloth/llama-3-8b-bnb-4bit，能有效降低错误率，支持长文本处理。【此简介由AI生成】

这并非谷歌官方支持的产品。

概述

DiarizationLM 模型在 Fisher 语料库的训练子集上进行了微调。

基础模型：unsloth/llama-3-8b-bnb-4bit
微调脚本：https://github.com/google/speaker-id/tree/master/DiarizationLM/unsloth

本模型与 google/DiarizationLM-8b-Fisher-v1 的区别：

对于本模型，损失仅在补全 tokens 上计算。
对于 google/DiarizationLM-8b-Fisher-v1，损失也在提示 tokens 上计算。

训练配置

本模型在 Fisher 语料库的训练子集上进行微调，使用秩为 256 的 LoRA 适配器。训练参数总数为 671,088,640。在批大小为 16 的情况下，模型训练了 28800 步，约为训练数据的 9 个 epoch。

我们在训练中使用了 mixed 风格，即我们组合了来自 hyp2ora 和 deg2ref 风格的数据。经过提示构建器处理后，我们的训练集中共有 51,063 个提示-补全对。

微调在一台配备 80GB 内存的 NVIDIA A100 GPU 的 Google Cloud VM 实例上进行，耗时超过 4 天。

输入到本模型的提示最大长度为 6000 个字符，包括 " --> " 后缀。最大序列长度为 4096 个 tokens。

指标

Fisher 测试集

系统	WER (%)	WDER (%)	cpWER (%)
USM + turn-to-diarize 基线	15.48	5.32	21.19
+ 本模型	-	3.28	18.37

Callhome 测试集

系统	WER (%)	WDER (%)	cpWER (%)
USM + turn-to-diarize 基线	15.36	7.72	24.39
+ 本模型	-	6.66	23.57

使用方法

首先，您需要安装两个软件包：

pip install transformers diarizationlm

在配备 GPU 和 CUDA 的机器上，您可以通过运行以下脚本来使用该模型：

from transformers import LlamaForCausalLM, AutoTokenizer
from diarizationlm import utils

HYPOTHESIS = """<speaker:1> Hello, how are you doing <speaker:2> today? I am doing well. What about <speaker:1> you? I'm doing well, too. Thank you."""

print("Loading model...")
tokenizer = AutoTokenizer.from_pretrained("google/DiarizationLM-8b-Fisher-v2", device_map="cuda")
model = LlamaForCausalLM.from_pretrained("google/DiarizationLM-8b-Fisher-v2", device_map="cuda")

print("Tokenizing input...")
inputs = tokenizer([HYPOTHESIS + " --> "], return_tensors = "pt").to("cuda")

print("Generating completion...")
outputs = model.generate(**inputs,
                         max_new_tokens = inputs.input_ids.shape[1] * 1.2,
                         use_cache = False)

print("Decoding completion...")
completion = tokenizer.batch_decode(outputs[:, inputs.input_ids.shape[1]:],
                                    skip_special_tokens = True)[0]

print("Transferring completion to hypothesis text...")
transferred_completion = utils.transfer_llm_completion(completion, HYPOTHESIS)

print("========================================")
print("Hypothesis:", HYPOTHESIS)
print("========================================")
print("Completion:", completion)
print("========================================")
print("Transferred completion:", transferred_completion)
print("========================================")

输出结果如下：

Loading model...
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:13<00:00,  3.32s/it]
generation_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 172/172 [00:00<00:00, 992kB/s]
Tokenizing input...
Generating completion...
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Decoding completion...
Transferring completion to hypothesis text...
========================================
Hypothesis: <speaker:1> Hello, how are you doing <speaker:2> today? I am doing well. What about <speaker:1> you? I'm doing well, too. Thank you.
========================================
Completion:  <speaker:1> Hello, how are you doing today? <speaker:2> I am doing well. What about you? <speaker:1> I'm doing well, too. Thank you. [eod] [eod] <speaker:1
========================================
Transferred completion: <speaker:1> Hello, how are you doing today? <speaker:2> I am doing well. What about you? <speaker:1> I'm doing well, too. Thank you.
========================================

引用方式

我们的论文引用格式如下：

@article{wang2024diarizationlm,
  title={{DiarizationLM: Speaker Diarization Post-Processing with Large Language Models}},
  author={Quan Wang and Yiling Huang and Guanlong Zhao and Evan Clark and Wei Xia and Hank Liao},
  journal={arXiv preprint arXiv:2401.03506},
  year={2024}
}