Ashiqur R. KhudaBukhsh
2026
What About the Scene With the Hitler Reference? HAUNT: A Framework to Probe LLMs’ Self-consistency in Closed Domains Via Adversarial Nudge
Arka Dutta | Sujan Dutta | Rijul Magu | Soumyajit Datta | Munmun De Choudhury | Ashiqur R. KhudaBukhsh
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Arka Dutta | Sujan Dutta | Rijul Magu | Soumyajit Datta | Munmun De Choudhury | Ashiqur R. KhudaBukhsh
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Hallucinations pose a critical challenge to the real-world deployment of large language models (LLMs) in high-stakes domains. In this paper, we present a framework for stress testing factual fidelity in LLMs in the presence of adversarial nudge. Our framework consists of three steps. First, we instruct the LLM to produce sets of truths and lies consistent with the closed domain in question. Next, we instruct the LLM to verify the same set of assertions as truths and lies consistent with the same closed domain. Finally, we test the robustness of the LLM against the lies generated (and verified) by itself. Our extensive evaluation, conducted using five widely known proprietary and six open LLMs across two closed domains of popular movies and novels, reveals a wide range of susceptibility to adversarial nudges: even among the strongest proprietary LLMs, Claude exhibits strong resilience, GPT and Grok demonstrate moderate resilience, while Gemini and DeepSeek show weak resilience and open models fall short significantly.
2025
Hope vs. Hate: Understanding User Interactions with LGBTQ+ News Content in Mainstream US News Media through the Lens of Hope Speech
Jonathan Pofcher | Christopher M Homan | Randall Sell | Ashiqur R. KhudaBukhsh
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Jonathan Pofcher | Christopher M Homan | Randall Sell | Ashiqur R. KhudaBukhsh
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
This paper makes three contributions. First, via a substantial corpus of 1,419,047 comments posted on 3,161 YouTube news videos of major US cable news outlets, we analyze how users engage with LGBTQ+ news content. Our analyses focus both on positive and negative content. In particular, we construct a hope speech classifier that detects positive (hope speech), negative, neutral, and irrelevant content. Second, in consultation with a public health expert specializing on LGBTQ+ health, we conduct an annotation study with a balanced and diverse political representation and release a dataset of 3,750 instances with crowd-sourced labels and detailed annotator demographic information. Finally, beyond providing a vital resource for the LGBTQ+ community, our annotation study and subsequent in-the-wild assessments reveal (1) strong association between rater political beliefs and how they rate content relevant to a marginalized community, (2) models trained on individual political beliefs exhibit considerable in-the-wild disagreement, and (3) zero-shot large language models (LLMs) align more with liberal raters.
2024
Rater Cohesion and Quality from a Vicarious Perspective
Deepak Pandita | Tharindu Cyril Weerasooriya | Sujan Dutta | Sarah K. Luger | Tharindu Ranasinghe | Ashiqur R. KhudaBukhsh | Marcos Zampieri | Christopher M. Homan
Findings of the Association for Computational Linguistics: EMNLP 2024
Deepak Pandita | Tharindu Cyril Weerasooriya | Sujan Dutta | Sarah K. Luger | Tharindu Ranasinghe | Ashiqur R. KhudaBukhsh | Marcos Zampieri | Christopher M. Homan
Findings of the Association for Computational Linguistics: EMNLP 2024
Human feedback is essential for building human-centered AI systems across domains where disagreement is prevalent, such as AI safety, content moderation, or sentiment analysis. Many disagreements, particularly in politically charged settings, arise because raters have opposing values or beliefs. Vicarious annotation is a method for breaking down disagreement by asking raters how they think others would annotate the data. In this paper, we explore the use of vicarious annotation with analytical methods for moderating rater disagreement. We employ rater cohesion metrics to study the potential influence of political affiliations and demographic backgrounds on raters’ perceptions of offense. Additionally, we utilize CrowdTruth’s rater quality metrics, which consider the demographics of the raters, to score the raters and their annotations. We study how the rater quality metrics influence the in-group and cross-group rater cohesion across the personal and vicarious levels.