HuggingFace镜像/open-vakgyata
模型介绍文件和版本分析

模型名称: open-vakgyata

模型概述: open-vakgyata 是一个开源语言识别模型,能够从语音输入中检测和分类印度语言。

支持的语言:

语言代码
英语(印度)en-IN
印地语hi-IN
奥里亚语or-IN
孟加拉语bn-IN
泰米尔语ta-IN
泰卢固语te-IN
卡纳达语kn-IN
马拉雅拉姆语ml-IN
马拉地语mr-IN
古吉拉特语gu-IN

规格说明

  • 支持的采样率:16000
  • 推荐音频格式:16kHz,16bit PCM

使用方法:

from transformers import Wav2Vec2ForSequenceClassification, AutoFeatureExtractor
import torch

device = "cpu" # "cuda"

model_id = "onecxi/open-vakgyata"

processor = AutoFeatureExtractor.from_pretrained(model_id)
model = Wav2Vec2ForSequenceClassification.from_pretrained(model_id).to(device)

推理:

import torchaudio

audio, sr = torchaudio.load("path/to/audio.wav")

# Process the waveform and move to the appropriate device
inputs = processor(audio.flatten(), sampling_rate=sr, return_tensors="pt").to(device)

# Perform inference
with torch.no_grad():
    logits = model(**inputs).logits

# Get language probabilities
probs = logits.softmax(dim=-1).cpu().numpy()
language = model.config.id2label.get(probs.argmax())

print(language)

引用

如果您在研究或应用中使用此模型,请考虑引用该模型及其基础来源:

@misc{vakgyata2024,
  title={vakgyata: Language Identification for Indian Speech},
  author={OneCXI},
  year={2024},
  url={https://huggingface.co/onecxi/open-vakgyata}
}

下载使用量0