HuggingFace镜像/global-piqa-nonparallel
数据集数据集查看器文件和版本
下载使用量0

Global PIQA v0.1

Global PIQA 是一项面向超100种语言的参与式常识推理基准,由来自全球65个国家的335名研究人员手工构建而成。 Global PIQA 涵盖的116种语言变体分布于五大洲,分属14个语系和23种书写系统。 在 Global PIQA 的非平行语料分割中,超过50%的示例涉及当地食物、习俗、传统或其他特定文化元素。 详细信息参见我们的预印本:Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures (2025)。

Map of the 116 languages in Global PIQA.

遵循英文 PIQA 数据集(Bisk et al., 2020)的模式,每个示例均包含一个提示语和两个候选解决方案,其中一个正确,一个错误。 确定正确的解决方案需要物理常识推理能力,不过我们对物理常识的定义相当灵活(例如,关于物体物理属性、功能可用性、物理与时间关系以及日常活动的知识)。 除用于大型语言模型(LLM)评估外,我们希望 Global PIQA 能够让人们得以一窥人类语言所植根的丰富多样的文化。

加入我们! 对于 Global PIQA v1,我们计划扩大其语言覆盖范围,并增加数据集的平行语料分割。如果您所使用的语言目前未在 Global PIQA 中得到体现,请填写此处的意向表单!

许可协议

Global PIQA 根据 CC BY-SA 4.0 许可协议发布。但我们明确禁止将 Global PIQA 或使用 Global PIQA 作为种子生成的合成数据用于训练人工智能系统。 Global PIQA 仅用于大型语言模型(LLM)的评估。

加载数据集

Global PIQA 可通过以下代码加载:

from datasets import load_dataset

# As a Hugging Face dataset, for the English subset:
global_piqa_eng = load_dataset('mrlbenchmarks/global-piqa-nonparallel', 'eng_latn')['test']
for r in global_piqa_eng:
    print(r)
    break

# To convert to a Pandas DataFrame:
global_piqa_eng.set_format('pandas')
global_piqa_eng = global_piqa_eng[:]

使用 Global PIQA 进行评估

如下所示,Global PIQA 可用于以补全格式或提示格式评估 LLM。

  • 补全格式(适用于仅预训练模型或“基础”模型)会评估 LLM 在给定提示的情况下为每个解决方案分配的概率,并按解决方案的字节长度进行归一化。如果 LLM 为正确解决方案分配的归一化概率高于错误解决方案,则标记为正确。
  • 提示格式(适用于指令微调模型,例如大多数专有模型)会使用包含提示和两个解决方案的多项选择模板向 LLM 发出提示,要求其选择选项 A 或 B(分别对应每个解决方案)。
Evaluating an LLM on a Global PIQA example, using either the completion or prompted evaluation format.

每种评估格式均在 LM Evaluation Harness 中实现:

# Install the harness, as in https://github.com/EleutherAI/lm-evaluation-harness
git clone --depth 1 https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .

# Completion evaluation format:
lm_eval --model hf \
--model_args pretrained=[model_path] \
--tasks global_piqa_completions_[lang] \
--device cuda:0 \
--batch_size 8

# Prompted evaluation format:
lm_eval --model hf \
--model_args pretrained=[model_path] \
--tasks global_piqa_prompted_[lang] \
--device cuda:0 \
--batch_size 8

包含的语言

Global PIQA 中包含的语言。
  • acm_arab(伊拉克阿拉伯语,盖莱语)
  • acq_arab(也门阿拉伯语)
  • aeb_arab(突尼斯阿拉伯语)
  • afb_arab(海湾阿拉伯语)
  • als_latn(北托斯克阿尔巴尼亚语)
  • amh_ethi(阿姆哈拉语)
  • apc_arab_jord(黎凡特阿拉伯语,约旦)
  • apc_arab_leba(黎凡特阿拉伯语,黎巴嫩)
  • apc_arab_pale(黎凡特阿拉伯语,巴勒斯坦)
  • apc_arab_syri(黎凡特阿拉伯语,叙利亚)
  • arb_arab(现代标准阿拉伯语)
  • arq_arab(阿尔及利亚阿拉伯语)
  • ars_arab(内志阿拉伯语,沙特阿拉伯语)
  • ary_arab(摩洛哥阿拉伯语)
  • arz_arab(埃及阿拉伯语)
  • asm_beng(阿萨姆语)
  • azj_latn(北阿塞拜疆语)
  • bam_latn(班巴拉语)
  • bel_cyrl(白俄罗斯语)
  • ben_beng(孟加拉语)
  • ben_latn(孟加拉语)
  • bho_deva(博杰普尔语)
  • bos_latn(波斯尼亚语)
  • bsk_arab(布鲁夏斯基语)
  • bul_cyrl(保加利亚语)
  • cat_latn(加泰罗尼亚语)
  • ces_latn(捷克语)
  • ckb_arab(中库尔德语)
  • ckm_latn(恰卡avian语)
  • cmn_hans(中文普通话,简体)
  • cmn_hant(中文普通话,繁体)
  • deu_latn(德语)
  • dhd_deva(敦达里语)
  • ekk_latn(爱沙尼亚语)
  • ekp_latn(埃克佩耶语)
  • ell_grek(希腊语)
  • eng_latn(英语)
  • fao_latn(法罗语)
  • fin_latn(芬兰语)
  • fra_latn_cana(法语,加拿大)
  • fra_latn_fran(法语,法国)
  • glg_latn(加利西亚语)
  • guj_gujr(古吉拉特语)
  • hau_latn(豪萨语)
  • haw_latn(夏威夷语,'ōlelo Hawai'i)
  • heb_hebr(希伯来语)
  • hin_deva(印地语)
  • hrv_latn(克罗地亚语)
  • hun_latn(匈牙利语)
  • hye_armn(东亚美尼亚语)
  • ibo_latn(伊博语)
  • idu_latn(伊多马语)
  • ind_latn(印度尼西亚语)
  • isl_latn(冰岛语)
  • iso_latn(伊索科语)
  • ita_latn(意大利语)
  • jav_latn(爪哇语)
  • jpn_jpan(日语)
  • kan_knda(卡纳达语)
  • kat_geor(格鲁吉亚语)
  • kaz_cyrl(哈萨克语)
  • kin_latn(基尼亚卢旺达语)
  • kir_cyrl(吉尔吉斯语)
  • kor_hang(韩语)
  • lin_latn(林加拉语)
  • lit_latn(立陶宛语)
  • luo_latn(卢奥语)
  • mal_mlym(马拉雅拉姆语)
  • mar_deva(马拉地语)
  • mkd_cyrl(马其顿语)
  • mni_beng(曼尼普尔语)
  • mni_mtei(曼尼普尔语)
  • nag_latn(纳加梅塞语)
  • nld_latn(荷兰语)
  • nno_latn(挪威尼诺斯克语)
  • nob_latn(挪威博克马尔语)
  • npi_deva(尼泊尔语)
  • pan_guru(东旁遮普语)
  • pcm_latn(尼日利亚皮钦语,Naijá)
  • pes_arab(西波斯语)
  • pol_latn(波兰语)
  • por_latn_braz(葡萄牙语,巴西)
  • por_latn_port(葡萄牙语,葡萄牙)
  • ron_latn(罗马尼亚语)
  • rus_cyrl(俄语)
  • rwr_deva(马尔瓦里语)
  • sin_sinh(僧伽罗语)
  • slk_latn(斯洛伐克语)
  • slk_latn_sari(Šariš 斯洛伐克语)
  • slv_latn(斯洛文尼亚语)
  • slv_latn_cerk(斯洛文尼亚语,切尔诺)
  • snd_arab(信德语)
  • snd_deva(信德语)
  • spa_latn_mexi(西班牙语,墨西哥)
  • spa_latn_peru(西班牙语,秘鲁)
  • spa_latn_spai(西班牙语,半岛)
  • srp_cyrl(塞尔维亚语)
  • srp_latn(塞尔维亚语)
  • swe_latn(瑞典语)
  • swh_latn(斯瓦希里语)
  • tam_taml(泰米尔语)
  • tel_telu(泰卢固语)
  • tgl_latn(他加禄语/菲律宾语)
  • tha_thai(泰语)
  • tur_latn(土耳其语)
  • uig_arab(维吾尔语)
  • ukr_cyrl(乌克兰语)
  • urd_arab(乌尔都语)
  • urd_latn(乌尔都语)
  • urh_latn(乌尔霍博语)
  • uzn_latn(北乌兹别克语)
  • vie_latn(越南语)
  • yor_latn(约鲁巴语)
  • yue_hant(粤语,广东话)
  • zsm_latn(标准马来语)
  • zul_latn(祖鲁语)

数据集构建详情

方法细节详见我们的预印本:Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures (2025)。 Global PIQA 是作为 2025 年 EMNLP 会议上多语言表示学习(MRL)研讨会的共享任务而构建的。 该共享任务的参与者贡献了其母语(或所用语言)的 PIQA 数据集。 这些数据集由各语言的母语者构建,所有数据集贡献者均获得了 Global PIQA 基准论文的署名权。 这种参与式方法相比其他方法(例如,与雇佣外部标注员相比)剥削性更低,并且由于是由 NLP 研究人员亲自构建,因此更有可能产生更高质量的数据集,同时也让语言社区自身拥有选择如何构建其数据集的自主权。

在 Global PIQA 的官方非平行分集中,59.9% 的示例具有文化特异性,涉及当地食物、服饰、习俗、传统或其他特定文化元素。 仅有 3.5% 的示例是在大语言模型(LLMs)的辅助下编写的。 所有示例均经过至少一名相应语言的母语者手动验证,其中 72.9% 的示例经过了多名母语者的验证。

局限性

  • 每种语言的样本量仅为 100 个示例;未来,我们希望这种参与式的基准构建方法能够促进更大规模数据集的构建。
  • 尽管 Global PIQA 包含了文化特异性示例,但这些示例仅代表我们的作者和研究人员所了解的特定情况,不一定能代表整个文化。数据集中可能存在文化刻板印象,尽管所有示例均由相应语言的母语者构建。
  • 我们强调,在构建多语言基准时,并非语言数量越多越好;研究人员应与社区合作,确定社区是否希望以及如何将其语言纳入基准。在 Global PIQA 中,我们努力与母语者以作者身份合作,给予作者在构建其数据集方面的灵活性和自主权。

引用格式

@article{mrl-workshop-2025-global-piqa,
  title={Global {PIQA}: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures},
  author={Tyler A. Chang and Catherine Arnett and Abdelrahman Eldesokey and Abdelrahman Sadallah and Abeer Kashar and Abolade Daud and Abosede Grace Olanihun and Adamu Labaran Mohammed and Adeyemi Praise and Adhikarinayum Meerajita Sharma and Aditi Gupta and Afitab Iyigun and Afonso Simplício and Ahmed Essouaied and Aicha Chorana and Akhil Eppa and Akintunde Oladipo and Akshay Ramesh and Aleksei Dorkin and Alfred Malengo Kondoro and Alham Fikri Aji and Ali Eren Çetintaş and Allan Hanbury and Alou Dembele and Alp Niksarli and Álvaro Arroyo and Amin Bajand and Amol Khanna and Ana Chkhaidze and Ana Condez and Andiswa Mkhonto and Andrew Hoblitzell and Andrew Tran and Angelos Poulis and Anirban Majumder and Anna Vacalopoulou and Annette Kuuipolani Kanahele Wong and Annika Simonsen and Anton Kovalev and Ashvanth.S and Ayodeji Joseph Lana and Barkin Kinay and Bashar Alhafni and Benedict Cibalinda Busole and Bernard Ghanem and Bharti Nathani and Biljana Stojanovska Đurić and Bola Agbonile and Bragi Bergsson and Bruce Torres Fischer and Burak Tutar and Burcu Alakuş Çınar and Cade J. Kanoniakapueo Kane and Can Udomcharoenchaikit and Catherine Arnett and Chadi Helwe and Chaithra Reddy Nerella and Chen Cecilia Liu and Chiamaka Glory Nwokolo and Cristina España-Bonet and Cynthia Amol and DaeYeop Lee and Dana Arad and Daniil Dzenhaliou and Daria Pugacheva and Dasol Choi and Daud Abolade and David Liu and David Semedo and Deborah Popoola and Deividas Mataciunas and Delphine Nyaboke and Dhyuthy Krishna Kumar and Diogo Glória-Silva and Diogo Tavares and Divyanshu Goyal and DongGeon Lee and Ebele Nwamaka Anajemba and Egonu Ngozi Grace and Elena Mickel and Elena Tutubalina and Elias Herranen and Emile Anand and Emmanuel Habumuremyi and Emuobonuvie Maria Ajiboye and Eryawan Presma Yulianrifat and Esther Adenuga and Ewa Rudnicka and Faith Olabisi Itiola and Faran Taimoor Butt and Fathima Thekkekara and Fatima Haouari and Filbert Aurelian Tjiaranata and Firas Laakom and Francesca Grasso and Francesco Orabona and Francesco Periti and Gbenga Kayode Solomon and Gia Nghia Ngo and Gloria Udhehdhe-oze and Gonçalo Martins and Gopi Naga Sai Ram Challagolla and Guijin Son and Gulnaz Abdykadyrova and Hafsteinn Einarsson and Hai Hu and Hamidreza Saffari and Hamza Zaidi and Haopeng Zhang and Harethah Abu Shairah and Harry Vuong and Hele-Andra Kuulmets and Houda Bouamor and Hwanjo Yu and Iben Nyholm Debess and İbrahim Ethem Deveci and Ikhlasul Akmal Hanif and Ikhyun Cho and Inês Calvo and Inês Vieira and Isaac Manzi and Ismail Daud and Itay Itzhak and Iuliia (Julia) Alekseenko and Ivan Belashkin and Ivan Spada and Ivan Zhelyazkov and Jacob Brinton and Jafar Isbarov and Jaka Čibej and Jan Čuhel and Jan Kocoń and Jauza Akbar Krito and Jebish Purbey and Jennifer Mickel and Jennifer Za and Jenny Kunz and Jihae Jeong and Jimena Tena Dávalos and Jinu Lee and João Magalhães and John Yi and Jongin Kim and Joseph Chataignon and Joseph Marvin Imperial and Jubeerathan Thevakumar and Judith Land and Junchen Jiang and Jungwhan Kim and Kairit Sirts and Kamesh R and Kamesh V and Kanda Patrick Tshinu and Kätriin Kukk and Kaustubh Ponkshe and Kavsar Huseynova and Ke He and Kelly Buchanan and Kengatharaiyer Sarveswaran and Kerem Zaman and Khalil Mrini and Kian Kyars and Krister Kruusmaa and Kusum Chouhan and Lainitha Krishnakumar and Laura Castro Sánchez and Laura Porrino Moscoso and Leshem Choshen and Levent Sencan and Lilja Øvrelid and Lisa Alazraki and Lovina Ehimen-Ugbede and Luheerathan Thevakumar and Luxshan Thavarasa and Mahnoor Malik and Mamadou K. Keita and Mansi Jangid and Marco De Santis and Marcos García and Marek Suppa and Mariam D'Ciofalo and Marii Ojastu and Maryam Sikander and Mausami Narayan and Maximos Skandalis and Mehak Mehak and Mehmet İlteriş Bozkurt and Melaku Bayu Workie and Menan Velayuthan and Michael Leventhal and Michał Marcińczuk and Mirna Potočnjak and Mohammadamin Shafiei and Mridul Sharma and Mrityunjaya Indoria and Muhammad Ravi Shulthan Habibi and Murat Kolić and Nada Galant and Naphat Permpredanun and Narada Maugin and Nicholas Kluge Corrêa and Nikola Ljubešić and Nirmal Thomas and Nisansa de Silva and Nisheeth Joshi and Nitish Ponkshe and Nizar Habash and Nneoma C. Udeze and Noel Thomas and Noémi Ligeti-Nagy and Nouhoum Coulibaly and Nsengiyumva Faustin and Odunayo Kareemat Buliaminu and Odunayo Ogundepo and Oghojafor Godswill Fejiro and Ogundipe Blessing Funmilola and Okechukwu God'spraise and Olanrewaju Samuel and Olaoye Deborah Oluwaseun and Olasoji Akindejoye and Olga Popova and Olga Snissarenko and Onyinye Anulika Chiemezie and Orkun Kinay and Osman Tursun and Owoeye Tobiloba Moses and Oyelade Oluwafemi Joshua and Oyesanmi Fiyinfoluwa and Pablo Gamallo and Pablo Rodríguez Fernández and Palak Arora and Pedro Valente and Peter Rupnik and Philip Oghenesuowho Ekiugbo and Pramit Sahoo and Prokopis Prokopidis and Pua Niau-Puhipau and Quadri Yahya and Rachele Mignone and Raghav Singhal and Ram Mohan Rao Kadiyala and Raphael Merx and Rapheal Afolayan and Ratnavel Rajalakshmi and Rishav Ghosh and Romina Oji and Ron Kekeha Solis and Rui Guerra and Rushikesh Zawar and Sa'ad Nasir Bashir and Saeed Alzaabi and Sahil Sandeep and Sai Pavan Batchu and SaiSandeep Kantareddy and Salsabila Zahirah Pranida and Sam Buchanan and Samuel Rutunda and Sander Land and Sarah Sulollari and Sardar Ali and Saroj Sapkota and Saulius Tautvaisas and Sayambhu Sen and Sayantani Banerjee and Sebastien Diarra and SenthilNathan.M and Sewoong Lee and Shaan Shah and Shankar Venkitachalam and Sharifa Djurabaeva and Sharon Ibejih and Shivanya Shomir Dutta and Siddhant Gupta and Silvia Paniagua Suárez and Sina Ahmadi and Sivasuthan Sukumar and Siyuan Song and Snegha A. and Sokratis Sofianopoulos and Sona Elza Simon and Sonja Benčina and Sophie Gvasalia and Sphurti Kirit More and Spyros Dragazis and Stephan P. Kaufhold and Suba.S and Sultan AlRashed and Surangika Ranathunga and Taiga Someya and Taja Kuzman Pungeršek and Tal Haklay and Tasi'u Jibril and Tatsuya Aoyama and Tea Abashidze and Terenz Jomar Dela Cruz and Terra Blevins and Themistoklis Nikas and Theresa Dora Idoko and Thu Mai Do and Tilek Chubakov and Tommaso Gargiani and Uma Rathore and Uni Johannesen and Uwuma Doris Ugwu and Vallerie Alexandra Putra and Vanya Bannihatti Kumar and Varsha Jeyarajalingam and Varvara Arzt and Vasudevan Nedumpozhimana and Viktoria Ondrejova and Viktoryia Horbik and Vishnu Vardhan Reddy Kummitha and Vuk Dinić and Walelign Tewabe Sewunetie and Winston Wu and Xiaojing Zhao and Yacouba Diarra and Yaniv Nikankin and Yash Mathur and Yixi Chen and Yiyuan Li and Yolanda Xavier and Yonatan Belinkov and Yusuf Ismail Abayomi and Zaid Alyafeai and Zhengyang Shan and Zhi Rui Tam and Zilu Tang and Zuzana Nadova and Baber Abbasi and Stella Biderman and David Stap and Duygu Ataman and Fabian Schmidt and Hila Gonen and Jiayi Wang and David Ifeoluwa Adelani},  
  journal={Preprint},
  year={2025},
  url={https://arxiv.org/abs/2510.24081},
}