Siddhant Shivdutt Singh
2025
EtiCor++: Towards Understanding Etiquettical Bias in LLMs
Ashutosh Dwivedi
|
Siddhant Shivdutt Singh
|
Ashutosh Modi
Findings of the Association for Computational Linguistics: ACL 2025
In recent years, researchers have started analyzing the cultural sensitivity of LLMs. In this respect, Etiquettes have been an active area of research. Etiquettes are region-specific and are an essential part of the culture of a region; hence, it is imperative to make LLMs sensitive to etiquettes. However, there needs to be more resources in evaluating LLMs for their understanding and bias with regard to etiquettes. In this resource paper, we introduce EtiCor++, a corpus of etiquettes worldwide. We introduce different tasks for evaluating LLMs for knowledge about etiquettes across various regions. Further, we introduce various metrics for measuring bias in LLMs. Extensive experimentation with LLMs shows inherent bias towards certain regions.
2024
Towards Measuring and Modeling “Culture” in LLMs: A Survey
Muhammad Farid Adilazuarda
|
Sagnik Mukherjee
|
Pradhyumna Lavania
|
Siddhant Shivdutt Singh
|
Alham Fikri Aji
|
Jacki O’Neill
|
Ashutosh Modi
|
Monojit Choudhury
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
We present a survey of more than 90 recent papers that aim to study cultural representation and inclusion in large language models (LLMs). We observe that none of the studies explicitly define “culture, which is a complex, multifaceted concept; instead, they probe the models on some specially designed datasets which represent certain aspects of “culture”. We call these aspects the proxies of culture, and organize them across two dimensions of demographic and semantic proxies. We also categorize the probing methods employed. Our analysis indicates that only certain aspects of “culture,” such as values and objectives, have been studied, leaving several other interesting and important facets, especially the multitude of semantic domains (Thompson et al., 2020) and aboutness (Hershcovich et al., 2022), unexplored. Two other crucial gaps are the lack of robustness of probing techniques and situated studies on the impact of cultural mis- and under-representation in LLM-based applications.