Ascend-SACT/VocalSound
数据集数据集查看器文件和版本Pull Requests讨论
下载使用量0

数据集介绍

VocalSound是一个免费的音频精度测试数据集,包含21024个声音片段,这些片段由3365名众包参与者录制,涵盖了笑声、叹息、咳嗽、清嗓子、打喷嚏和哈气等声音。此外,该数据集还包含一些元信息,例如说话者的年龄、性别、母语、所在国家以及健康状况等。

VocalSound数据集共有2个版本,分别为16kHz(1.7G)和44.1kHz(4.5G)版本。 音频片段在线试听

在该仓库中共存放有4个zip包,它们分别为:

1、github原数据集仓库提供的完整数据集

GitHub-YuanGongND/vocalsound

vs_release_16k.zip
vs_release_44k.zip

数据集目录结构如下:

data
├──readme.txt
├──class_labels_indices_vs.csv # include label code and name information
├──audio_16k
│  ├──f0003_0_cough.wav # female speaker, id=0003, 0=first collection (most spks only record once, but there are exceptions), cough
│  ├──f0003_0_laughter.wav
│  ├──f0003_0_sigh.wav
│  ├──f0003_0_sneeze.wav
│  ├──f0003_0_sniff.wav
│  ├──f0003_0_throatclearing.wav
│  ├──f0004_0_cough.wav # data from another female speaker 0004
│   ... (21024 files in total)
│   
├──audio_44k
│    # same recordings with those in data/data_16k, but are no downsampled
│   ├──f0003_0_cough.wav
│    ... (21024 files in total)
│
├──datafiles  # json datafiles that we use in our baseline experiment, you can ignore it if you don't use our training pipeline
│  ├──all.json  # all data
│  ├──te.json  # test data
│  ├──tr.json  # training data
│  ├──val.json  # validation data
│  └──subtest # subset of the test set, for fine-grained evaluation
│     ├──te_age1.json  # age [18-25]
│     ├──te_age2.json  # age [26-48]
│     ├──te_age3.json  # age [49-80]
│     ├──te_female.json
│     └──te_male.json
│
└──meta  # Meta information of the speakers [spk_id, gender, age, country, native language, health condition (no=no problem)]
   ├──all_meta.json  # all data
   ├──te_meta.json  # test data
   ├──tr_meta.json  # training data
   └──val_meta.json  # validation data

2、huggingface上提供的完整数据集

HuggingFace-MahiA/VocalSound

VocalSound.zip

3、来源于huggingface且适用于ais-bench的数据子集(文件损坏,暂未提供)

HuggingFace-maoxx241/audio_vocalsound_16k_subset

audio_vocalsound_16k_subset.zip

该压缩包需要下载至{工具根路径}/ais_bench/datasets目录后执行如下操作:

cd ais_bench/datasets
unzip audio_vocalsound_16k_subset.zip
mv audio_vocalsound_16k_subset vocalsound
mv vocalsound/subset1/* vocalsound/
mv vocalsound/subset2/* vocalsound/
mv vocalsound/subset3/* vocalsound/
mv vocalsound/subset4/* vocalsound/
mv vocalsound/subset5/* vocalsound/
rm audio_vocalsound_16k_subset.zip

数据集使用

具体可参考AISBench/benchmark

第一步:使用不同的zip包,解包后文件路径略微存在差别,只需要根据包内文件把全量的wav文件放置在{工具根路径}/ais_bench/datasets/vocalsound路径下即可。

第二步:拉起精度测试任务,需要使用vocalsound_gen_base64任务,可参考命令:ais_bench --models vllm_api_stream_chat --datasets vocalsound_gen_base64 --debug。