This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
MahammedKamruzzaman
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Persona assignment has become a common strategy for customizing LLM use to particular tasks and contexts. In this study, we explore how evaluation of different nations changes when LLMs are assigned specific nationality personas. We assign 193 different nationality personas (e.g., an American person) to five LLMs and examine how the LLM evaluations (or *“perceptions”*) of countries change. We find that all LLM-persona combinations tend to favor Western European nations, though nation-personas push LLM behaviors to focus more on and treat the nation-persona’s own region more favorably. Eastern European, Latin American, and African nations are treated more negatively by different nationality personas. We additionally find that evaluations by nation-persona LLMs of other nations correlate with human survey responses but fail to match the values closely. Our study provides insight into how biases and stereotypes are realized within LLMs when adopting different national personas. Our findings underscore the critical need for developing mechanisms to ensure that LLM outputs promote fairness and avoid over-generalization.
This study presents ***BanStereoSet***, a dataset designed to evaluate stereotypical social biases in multilingual LLMs for the Bangla language. In an effort to extend the focus of bias research beyond English-centric datasets, we have localized the content from the StereoSet, IndiBias, and kamruzzaman-etal’s datasets, producing a resource tailored to capture biases prevalent within the Bangla-speaking community. Our BanStereoSet dataset consists of 1,194 sentences spanning 9 categories of bias: race, profession, gender, ageism, beauty, beauty in profession, region, caste, and religion. This dataset not only serves as a crucial tool for measuring bias in multilingual LLMs but also facilitates the exploration of stereotypical bias across different social categories, potentially guiding the development of more equitable language technologies in *Bangladeshi* contexts. Our analysis of several language models using this dataset indicates significant biases, reinforcing the necessity for culturally and linguistically adapted datasets to develop more equitable language technologies.
Names often carry generational connotations, with certain names stereotypically associated with younger or older age groups. This study examines implicit age-related name bias in LLMs used for job recommendations. Analyzing six LLMs and 117 American names categorized by perceived age across 30 occupations, we find systematic bias: older-sounding names are favored for senior roles, while younger-sounding names are linked to youth-dominant jobs, reinforcing generational stereotypes. We also find that this bias is based on perceived rather than real ages associated with the names.
Large language models (LLMs) are increasingly used to predict human emotions, but previous studies show that these models reproduce gendered emotion stereotypes. Emotion stereotypes are also tightly tied to race and skin tone (consider for example the trope of the angry black woman), but previous work has thus far overlooked this dimension. In this paper, we address this gap by introducing the first large-scale multimodal study of racial, gender, and skin-tone bias in emotion attribution, revealing how modality (text, images) and their combination shape emotion stereotypes in Multimodal LLMs (MLLMs). We evaluate four open-source MLLMs using 2.1K emotion-related events paired with 400 neutral face images across three different prompt strategies. Our findings reveal varying biases in MLLMs representations of different racial groups: models reproduce racial stereotypes across modalities, with textual cues being particularly noticeable. Models also reproduce colourist trends, with darker skin tones showing more skew. Our research highlights the need for future rigorous evaluation and mitigation strategies that account for race, colorism, and gender in MLLMs.
Many recent studies have investigated social biases in LLMs but brand bias has received little attention. This research examines the biases exhibited by LLMs towards different brands, a significant concern given the widespread use of LLMs in affected use cases such as product recommendation and market analysis. Biased models may perpetuate societal inequalities, unfairly favoring established global brands while marginalizing local ones. Using a curated dataset across four brand categories, we probe the behavior of LLMs in this space. We find a consistent pattern of bias in this space—both in terms of disproportionately associating global brands with positive attributes and disproportionately recommending luxury gifts for individuals in high-income countries. We also find LLMs are subject to country-of-origin effects which may boost local brand preference in LLM outputs in specific contexts.
LLMs are increasingly powerful and widely used to assist users in a variety of tasks. This use risks introducing LLM biases into consequential decisions such as job hiring, human performance evaluation, and criminal sentencing. Bias in NLP systems along the lines of gender and ethnicity has been widely studied, especially for specific stereotypes (e.g., Asians are good at math). In this paper, we investigate bias along less-studied but still consequential, dimensions, such as age and beauty, measuring subtler correlated decisions that LLMs make between social groups and unrelated positive and negative attributes. Although these subtler biases are understudied they follow people as much as gender and ethnicity do. So, we want to see whether they also follow one with LLMs.We introduce a template-generated dataset of sentence completion tasks that asks the model to select the most appropriate attribute to complete an evaluative statement about a person described as a member of a specific social group. We also reverse the completion task to select the social group based on an attribute. We report the correlations that we find for 4 cutting-edge LLMs. This dataset can be used as a benchmark to evaluate progress in more generalized biases and the templating technique can be used to expand the benchmark with minimal additional human annotation.
Initial work has been done to address fake news detection and misrepresentation of news in the Bengali language. However, no work in Bengali yet addresses the identification of specific claims in social media news that falsely manipulate a related news article. At this point, this problem has been tackled in English and a few other languages, but not in the Bengali language. In this paper, we curate a dataset of social media content labeled with information manipulation relative to reference articles, called BanMANI. The dataset collection method we describe works around the limitations of the available NLP tools in Bangla. We expect these techniques will carry over to building similar datasets in other low-resource languages. BanMANI forms the basis both for evaluating the capabilities of existing NLP systems and for training or fine-tuning new models specifically on this task. In our analysis, we find that this task challenges current LLMs both under zero-shot and fine-tuned set- things