rinna/japanese-hubert-base
这是由rinna Co., Ltd.训练的日语HuBERT Base模型。
模型摘要
该模型架构与 相同,包含12个Transformer层和12个注意力头。 模型使用官方仓库中的代码进行训练,详细的训练配置可在同一仓库及原始论文中找到。
训练
该模型在约19,000小时的日语语音语料库ReazonSpeech v1上进行训练。
# coding = utf-8
import torch
import torch_npu
from transformers import HubertModel
import argparse
from openmind import pipeline, is_torch_npu_available
parser = argparse.ArgumentParser(description='manual to this script')
parser.add_argument("--model_name_or_path", type=str, default="./")
args = parser.parse_args()
model_path = args.model_name_or_path
device = None
if is_torch_npu_available():
device = "npu:0"
else:
device = "cpu"
model = HubertModel.from_pretrained(model_path)
model = model.to(device)
model.eval()
wav_input_16khz = torch.randn(1, 10000)
outputs = model(wav_input_16khz.npu())
print(f"Input: {wav_input_16khz.size()}") # [1, 10000]
print(f"Output: {outputs.last_hidden_state.size()}") # [1, 31, 768]
也可提供 fairseq 检查点文件。
@inproceedings{sawada2024release,
title = {Release of Pre-Trained Models for the {J}apanese Language},
author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
month = {5},
year = {2024},
pages = {13898--13905},
url = {https://aclanthology.org/2024.lrec-main.1213},
note = {\url{https://arxiv.org/abs/2404.01657}}
}@article{hsu2021hubert,
author = {Hsu, Wei-Ning and Bolte, Benjamin and Tsai, Yao-Hung Hubert and Lakhotia, Kushal and Salakhutdinov, Ruslan and Mohamed, Abdelrahman},
journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
title = {HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units},
year = {2021},
volume = {29},
pages = {3451-3460},
doi = {10.1109/TASLP.2021.3122291}
}