VocalSound是一个免费的音频精度测试数据集,包含21024个声音片段,这些片段由3365名众包参与者录制,涵盖了笑声、叹息、咳嗽、清嗓子、打喷嚏和哈气等声音。此外,该数据集还包含一些元信息,例如说话者的年龄、性别、母语、所在国家以及健康状况等。
VocalSound数据集共有2个版本,分别为16kHz(1.7G)和44.1kHz(4.5G)版本。 音频片段在线试听
在该仓库中共存放有4个zip包,它们分别为:
vs_release_16k.zip
vs_release_44k.zip数据集目录结构如下:
data
├──readme.txt
├──class_labels_indices_vs.csv # include label code and name information
├──audio_16k
│ ├──f0003_0_cough.wav # female speaker, id=0003, 0=first collection (most spks only record once, but there are exceptions), cough
│ ├──f0003_0_laughter.wav
│ ├──f0003_0_sigh.wav
│ ├──f0003_0_sneeze.wav
│ ├──f0003_0_sniff.wav
│ ├──f0003_0_throatclearing.wav
│ ├──f0004_0_cough.wav # data from another female speaker 0004
│ ... (21024 files in total)
│
├──audio_44k
│ # same recordings with those in data/data_16k, but are no downsampled
│ ├──f0003_0_cough.wav
│ ... (21024 files in total)
│
├──datafiles # json datafiles that we use in our baseline experiment, you can ignore it if you don't use our training pipeline
│ ├──all.json # all data
│ ├──te.json # test data
│ ├──tr.json # training data
│ ├──val.json # validation data
│ └──subtest # subset of the test set, for fine-grained evaluation
│ ├──te_age1.json # age [18-25]
│ ├──te_age2.json # age [26-48]
│ ├──te_age3.json # age [49-80]
│ ├──te_female.json
│ └──te_male.json
│
└──meta # Meta information of the speakers [spk_id, gender, age, country, native language, health condition (no=no problem)]
├──all_meta.json # all data
├──te_meta.json # test data
├──tr_meta.json # training data
└──val_meta.json # validation dataVocalSound.zipHuggingFace-maoxx241/audio_vocalsound_16k_subset
audio_vocalsound_16k_subset.zip该压缩包需要下载至{工具根路径}/ais_bench/datasets目录后执行如下操作:
cd ais_bench/datasets
unzip audio_vocalsound_16k_subset.zip
mv audio_vocalsound_16k_subset vocalsound
mv vocalsound/subset1/* vocalsound/
mv vocalsound/subset2/* vocalsound/
mv vocalsound/subset3/* vocalsound/
mv vocalsound/subset4/* vocalsound/
mv vocalsound/subset5/* vocalsound/
rm audio_vocalsound_16k_subset.zip具体可参考AISBench/benchmark
第一步:使用不同的zip包,解包后文件路径略微存在差别,只需要根据包内文件把全量的wav文件放置在{工具根路径}/ais_bench/datasets/vocalsound路径下即可。
第二步:拉起精度测试任务,需要使用vocalsound_gen_base64任务,可参考命令:ais_bench --models vllm_api_stream_chat --datasets vocalsound_gen_base64 --debug。