Matteo Guida
2026
Self-Explaining Hate Speech Detection with Moral Rationales
Francielle Vargas | Jackson Trager | Diego Alves | Matteo Guida | Surendrabikram Thapa | Berk At{\i}l | Daryna Dementieva | Andrew J Smart | Ameeta Agrawal
Findings of the Association for Computational Linguistics: ACL 2026
Francielle Vargas | Jackson Trager | Diego Alves | Matteo Guida | Surendrabikram Thapa | Berk At{\i}l | Daryna Dementieva | Andrew J Smart | Ameeta Agrawal
Findings of the Association for Computational Linguistics: ACL 2026
Existing hate speech detection models are often opaque and rely on surface-level lexical cues, which makes them vulnerable to spurious correlations and limits robustness, interpretability and cultural contextualization. We propose Supervised Moral Rationale Attention (SMRA), the first self-explaining hate speech detection framework to incorporate moral rationales as direct supervision for attention alignment. Based on Moral Foundations Theory, SMRA aligns token-level attention with expert-annotated moral rationales, guiding models to attend to morally salient spans. Unlike prior rationale-supervised or post-hoc approaches, SMRA integrates moral rationale supervision directly into the training objective, producing inherently interpretable and contextualized explanations. To support our framework, we also introduce HateBRMoralXplain, a Brazilian Portuguese benchmark dataset annotated with hate labels, moral categories, token-level moral rationales, and socio-political metadata. Across binary hate speech detection and multi-label moral sentiment classification, SMRA consistently improves performance while enhancing both faithful and plausible explanations. Although explanations become more concise, sufficiency decreases, indicating more compact and informative rationales. Fairness remains stable, suggesting that improvements in explanation quality do not introduce significant bias trade-offs.
Not all ANIMALs are equal: metaphorical framing through source domains and semantic frames
Yulia Otmakhova | Matteo Guida | Lea Frermann
Findings of the Association for Computational Linguistics: ACL 2026
Yulia Otmakhova | Matteo Guida | Lea Frermann
Findings of the Association for Computational Linguistics: ACL 2026
Metaphors are powerful framing devices, yet their source domains alone do not fully explain the specific associations they evoke. We argue that the interplay between source domains and semantic frames determines how metaphors shape understanding of complex issues, and present a computational framework that allows to derive salient discourse metaphors through their source domains and semantic frames. Applying this framework to climate change news, we uncover not only well-known source domains but also reveal nuanced frame-level associations that distinguish how the issue is portrayed. In analyzing immigration discourse across political ideologies, we demonstrate that liberals and conservatives systematically employ different semantic frames within the same source domains, with conservatives favoring frames emphasizing uncontrollability and liberals choosing neutral or more “victimizing” semantic frames. Our work bridges conceptual metaphor theory and linguistics, providing the first NLP approach for discovery of discourse metaphors and fine-grained analysis of differences in metaphorical framing.
Article and Comment Frames Shape the Quality of Online Comments
Matteo Guida | Yulia Otmakhova | Eduard Hovy | Lea Frermann
Findings of the Association for Computational Linguistics: ACL 2026
Matteo Guida | Yulia Otmakhova | Eduard Hovy | Lea Frermann
Findings of the Association for Computational Linguistics: ACL 2026
Framing theory posits that how information is presented shapes audience responses, but computational work has largely ignored audience reactions. While recent work has shown that article framing systematically shapes the content of reader responses, this paper asks: does framing also affect response quality? Analyzing 1M comments across 2.7K news articles, we operationalize quality as comment health. We find that article frames significantly predict comment health while controlling for topic, and that comments that adopt the article frame are healthier than those that depart from it. Further, unhealthy top-level comments tend to generate more unhealthy responses, independent of the frame being used in the comment. Our results establish a link between framing theory and discourse quality, laying the groundwork for downstream applications. We illustrate this potential with a pro-active frame-aware LLM- based system to mitigate unhealthy discourse.
2025
LLMs for Argument Mining: Detection, Extraction, and Relationship Classification of pre-defined Arguments in Online Comments
Matteo Guida | Yulia Otmakhova | Eduard Hovy | Lea Frermann
Proceedings of the 23rd Annual Workshop of the Australasian Language Technology Association
Matteo Guida | Yulia Otmakhova | Eduard Hovy | Lea Frermann
Proceedings of the 23rd Annual Workshop of the Australasian Language Technology Association
Automated large-scale analysis of public discussions around contested issues like abortion requires detecting and understanding the use of arguments. While Large Language Models (LLMs) have shown promise in language processing tasks, their performance in mining topic-specific, pre-defined arguments in online comments remains underexplored. We evaluate four state-of-the-art LLMs on three argument mining tasks using datasets comprising over 2,000 opinion comments across six polarizing topics. Quantitative evaluation suggests an overall strong performance across the three tasks, especially for large and fine-tuned LLMs, albeit at a significant environmental cost. However, a detailed error analysis revealed systematic shortcomings on long and nuanced comments and emotionally charged language, raising concerns for downstream applications like content moderation or opinion analysis. Our results highlight both the promise and current limitations of LLMs for automated argument analysis in online comments.
MFTCXplain: A Multilingual Benchmark Dataset for Evaluating the Moral Reasoning of LLMs through Multi-hop Hate Speech Explanation
Jackson Trager | Francielle Vargas | Diego Alves | Matteo Guida | Mikel K. Ngueajio | Ameeta Agrawal | Yalda Daryani | Farzan Karimi Malekabadi | Flor Miriam Plaza-del-Arco
Findings of the Association for Computational Linguistics: EMNLP 2025
Jackson Trager | Francielle Vargas | Diego Alves | Matteo Guida | Mikel K. Ngueajio | Ameeta Agrawal | Yalda Daryani | Farzan Karimi Malekabadi | Flor Miriam Plaza-del-Arco
Findings of the Association for Computational Linguistics: EMNLP 2025
Ensuring the moral reasoning capabilities of Large Language Models (LLMs) is a growing concern as these systems are used in socially sensitive tasks. Nevertheless, current evaluation benchmarks present two major shortcomings: a lack of annotations that justify moral classifications, which limits transparency and interpretability; and a predominant focus on English, which constrains the assessment of moral reasoning across diverse cultural settings. In this paper, we introduce MFTCXplain, a multilingual benchmark dataset for evaluating the moral reasoning of LLMs via multi-hop hate speech explanations using the Moral Foundations Theory. MFTCXplain comprises 3,000 tweets across Portuguese, Italian, Persian, and English, annotated with binary hate speech labels, moral categories, and text span-level rationales. Our results show a misalignment between LLM outputs and human annotations in moral reasoning tasks. While LLMs perform well in hate speech detection (F1 up to 0.836), their ability to predict moral sentiments is notably weak (F1 < 0.35). Furthermore, rationale alignment remains limited mainly in underrepresented languages. Our findings show the limited capacity of current LLMs to internalize and reflect human moral reasoning.