Shion Guha


2025

pdf bib
Auditing Political Bias in Text Generation by GPT-4 using Sociocultural and Demographic Personas: Case of Bengali Ethnolinguistic Communities
Dipto Das | Syed Ishtiaque Ahmed | Shion Guha
Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025)

Though large language models (LLMs) are increasingly used in multilingual contexts, their political and sociocultural biases in low-resource languages remain critically underexplored. In this paper, we investigate how LLM-generated texts in Bengali shift in response to personas with varying political orientations (left vs. right), religious identities (Hindu vs. Muslim), and national affiliations (Bangladeshi vs. Indian). In a quasi-experimental study, we simulate these personas and prompt an LLM to respond to political discussions. Measuring the shifts relative to responses for a baseline Bengali persona, we examined how political orientation influences LLM outputs, how topical association shape the political leanings of outputs, and how demographic persona-induced changes align with differently politically oriented variations. Our findings highlight left-leaning political bias in Bengali text generation and its significant association with Muslim sociocultural and demographic identity. We also connect our findings with broader discussions around emancipatory politics, epistemological considerations, and alignment of multilingual models.

2023

pdf bib
Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity
Dipto Das | Shion Guha | Bryan Semaan
Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)

Critical studies found NLP systems to bias based on gender and racial identities. However, few studies focused on identities defined by cultural factors like religion and nationality. Compared to English, such research efforts are even further limited in major languages like Bengali due to the unavailability of labeled datasets. This paper describes a process for developing a bias evaluation dataset highlighting cultural influences on identity. We also provide a Bengali dataset as an artifact outcome that can contribute to future critical research.