BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer

Akari Asai; Sneha Kudugunta; Xinyan Yu; Terra Blevins; Hila Gonen; Machel Reid; Yulia Tsvetkov; Sebastian Ruder; Hannaneh Hajishirzi

doi:10.18653/v1/2024.naacl-long.100

BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer

Akari Asai, Sneha Kudugunta, Xinyan Yu, Terra Blevins, Hila Gonen, Machel Reid, Yulia Tsvetkov, Sebastian Ruder, Hannaneh Hajishirzi

Abstract

Despite remarkable advancements in few-shot generalization in natural language processing, most models are developed and evaluated primarily in English. To establish a rigorous and equitable evaluation framework for few-shot cross-lingual transfer, we introduce a new benchmark, called BUFFET, which unifies 15 diverse tasks across 54 languages in a sequence-to-sequence format and provides a fixed set of few-shot examples and instructions. Using BUFFET, we perform thorough evaluations of ten state-of-the-art multilingual large language models with different transfer methods, namely in-context learning and fine-tuning. Our findings reveal significant room for improvement in few-shot in-context cross-lingual transfer. Strong multilingual pre-trained or instruction-tuned models such as BLOOM or ChatGPT often lag behind much smaller mT5-base models given the same number of few-shot samples, particularly in low-resource languages. Our analysis suggests avenues for future research in few-shot cross-lingual transfer.

Anthology ID:: 2024.naacl-long.100
Volume:: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Kevin Duh, Helena Gomez, Steven Bethard
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1771–1800
Language:
URL:: https://aclanthology.org/2024.naacl-long.100
DOI:: 10.18653/v1/2024.naacl-long.100
Bibkey:
Cite (ACL):: Akari Asai, Sneha Kudugunta, Xinyan Yu, Terra Blevins, Hila Gonen, Machel Reid, Yulia Tsvetkov, Sebastian Ruder, and Hannaneh Hajishirzi. 2024. BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 1771–1800, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer (Asai et al., NAACL 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-4/2024.naacl-long.100.pdf

PDF Search