HuggingFace镜像/japanese-hubert-base
模型介绍文件和版本分析

rinna/japanese-hubert-base

rinna-icon

概述

这是由rinna Co., Ltd.训练的日语HuBERT Base模型。

  • 模型摘要

    该模型架构与 相同,包含12个Transformer层和12个注意力头。 模型使用官方仓库中的代码进行训练,详细的训练配置可在同一仓库及原始论文中找到。

  • 训练

    该模型在约19,000小时的日语语音语料库ReazonSpeech v1上进行训练。


模型使用方法

# coding = utf-8
import torch
import torch_npu
from transformers import HubertModel


import argparse
from openmind import pipeline, is_torch_npu_available
parser = argparse.ArgumentParser(description='manual to this script')
parser.add_argument("--model_name_or_path", type=str, default="./")
args = parser.parse_args()
model_path = args.model_name_or_path
device = None
if is_torch_npu_available():
    device = "npu:0"
else:
    device = "cpu"

model = HubertModel.from_pretrained(model_path)
model = model.to(device)
model.eval()

wav_input_16khz = torch.randn(1, 10000)
outputs = model(wav_input_16khz.npu())
print(f"Input:   {wav_input_16khz.size()}")  # [1, 10000]
print(f"Output:  {outputs.last_hidden_state.size()}")  # [1, 31, 768]

也可提供 fairseq 检查点文件。


引用方式


@inproceedings{sawada2024release,
    title = {Release of Pre-Trained Models for the {J}apanese Language},
    author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
    booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
    month = {5},
    year = {2024},
    pages = {13898--13905},
    url = {https://aclanthology.org/2024.lrec-main.1213},
    note = {\url{https://arxiv.org/abs/2404.01657}}
}

参考文献

@article{hsu2021hubert,
    author = {Hsu, Wei-Ning and Bolte, Benjamin and Tsai, Yao-Hung Hubert and Lakhotia, Kushal and Salakhutdinov, Ruslan and Mohamed, Abdelrahman},
    journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
    title = {HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units},
    year = {2021},
    volume = {29},
    pages = {3451-3460},
    doi = {10.1109/TASLP.2021.3122291}
}

许可证

Apache 2.0 许可证

下载使用量0