Geo-Cultural Representation and Inclusion in Language Technologies

Sunipa Dev, Rida Qadri


Abstract
Training and evaluation of language models are increasingly relying on semi-structured data that is annotated by humans, along with techniques such as RLHF growing in usage across the board. As a result, both the data and the human perspectives involved in this process play a key role in what is taken as ground truth by our models. As annotation tasks are becoming increasingly more subjective and culturally complex, it is unclear how much of their socio-cultural identity annotators use to respond to tasks. We also currently do not have ways to integrate rich and diverse community perspectives into our language technologies. Accounting for such cross-cultural differences in interacting with technology is an increasingly crucial step for evaluating AI harms holistically. Without this, the state of the art of the AI models being deployed is at risk of causing unprecedented biases at a global scale. In this tutorial, we will take an interactive approach by utilizing some different types of annotation tasks to investigate together how our different socio-cultural perspectives and lived experiences influence what we consider as appropriate representations of global concepts.
Anthology ID:
2024.lrec-tutorials.2
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024): Tutorial Summaries
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Roman Klinger, Naozaki Okazaki, Nicoletta Calzolari, Min-Yen Kan
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
9–12
Language:
URL:
https://aclanthology.org/2024.lrec-tutorials.2
DOI:
Bibkey:
Cite (ACL):
Sunipa Dev and Rida Qadri. 2024. Geo-Cultural Representation and Inclusion in Language Technologies. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024): Tutorial Summaries, pages 9–12, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Geo-Cultural Representation and Inclusion in Language Technologies (Dev & Qadri, LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2024.lrec-tutorials.2.pdf