Towards Measuring and Modeling “Culture” in LLMs: A Survey

Muhammad Farid Adilazuarda, Sagnik Mukherjee, Pradhyumna Lavania, Siddhant Shivdutt Singh, Alham Fikri Aji, Jacki O’Neill, Ashutosh Modi, Monojit Choudhury


Abstract
We present a survey of more than 90 recent papers that aim to study cultural representation and inclusion in large language models (LLMs). We observe that none of the studies explicitly define “culture, which is a complex, multifaceted concept; instead, they probe the models on some specially designed datasets which represent certain aspects of “culture”. We call these aspects the proxies of culture, and organize them across two dimensions of demographic and semantic proxies. We also categorize the probing methods employed. Our analysis indicates that only certain aspects of “culture,” such as values and objectives, have been studied, leaving several other interesting and important facets, especially the multitude of semantic domains (Thompson et al., 2020) and aboutness (Hershcovich et al., 2022), unexplored. Two other crucial gaps are the lack of robustness of probing techniques and situated studies on the impact of cultural mis- and under-representation in LLM-based applications.
Anthology ID:
2024.emnlp-main.882
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15763–15784
Language:
URL:
https://preview.aclanthology.org/icon-24-ingestion/2024.emnlp-main.882/
DOI:
10.18653/v1/2024.emnlp-main.882
Bibkey:
Cite (ACL):
Muhammad Farid Adilazuarda, Sagnik Mukherjee, Pradhyumna Lavania, Siddhant Shivdutt Singh, Alham Fikri Aji, Jacki O’Neill, Ashutosh Modi, and Monojit Choudhury. 2024. Towards Measuring and Modeling “Culture” in LLMs: A Survey. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 15763–15784, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Towards Measuring and Modeling “Culture” in LLMs: A Survey (Adilazuarda et al., EMNLP 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/icon-24-ingestion/2024.emnlp-main.882.pdf