Kasu Sai Kartheek Reddy

Also published as: Sai Kartheek Reddy Kasu, Sai Kartheek Reddy Kasu


2026

Subtle and indirect hate speech remains an underexplored challenge in online safety research, particularly when harmful intent is embedded within misleading or manipulative narratives. Existing hate speech datasets primarily capture overt toxicity, underrepresenting the nuanced ways misinformation can incite or normalize hate. To address this gap, we present HateMirage, a novel dataset of Faux Hate comments designed to advance reasoning and explainability research on hate emerging from fake or distorted narratives. The dataset was constructed by identifying widely debunked misinformation claims from fact-checking sources and tracing related YouTube discussions, resulting in 4,530 user comments. Each comment is annotated along three interpretable dimensions: Target (who is affected), Intent (the underlying motivation or goal behind the comment), and Implication (its potential social impact). Unlike prior explainability datasets such as HateXplain and HARE, which offer token-level or single-dimensional reasoning, HateMirage introduces a multi-dimensional explanation framework that captures the interplay between misinformation, harm, and social consequence. We benchmark multiple open-source language models on HateMirage using ROUGE-L F1 and Sentence-BERT similarity to assess explanation coherence. Results suggest that explanation quality may depend more on pretraining diversity and reasoning-oriented data rather than on model scale alone. By coupling misinformation reasoning with harm attribution, HateMirage establishes a new benchmark for interpretable hate detection and responsible AI research.

2024

The rapid expansion of social media has led toan increase in code-mixed content, presentingsignificant challenges in the effective detectionof hate speech and fake narratives. To advanceresearch in this area, a shared task titled De-coding Fake Narratives in Spreading HatefulStories (Faux-Hate) was organized as part ofICON 2024. This paper introduces a multi-task learning model designed to classify Hindi-English code-mixed tweets into two distinct cat-egories: hate speech and false content. The pro-posed framework utilizes fastText embeddingsto create a shared feature space that adeptly cap-tures the semantic and syntactic intricacies ofcode-mixed text, including transliterated termsand out-of-vocabulary words. These sharedembeddings are then processed through twoindependent Support Vector Machine (SVM)classifiers, each specifically tailored for oneof the classification tasks. Our team, secured10th place among the participating teams, asevaluated by the organizers based on Macro F1scores.
Hateful online content is a growing concern, especially for young people. While social media platforms aim to connect us, they can also become breeding grounds for negativity and harmful language. This study tackles this issue by proposing a novel framework called HOLD-Z, specifically designed to detect hate and offensive comments in Telugu-English code-mixed social media content. HOLD-Z leverages a combination of approaches, including three powerful models: LSTM architecture, Zypher, and openchat_3.5. The study highlights the effectiveness of prompt engineering and Quantized Low-Rank Adaptation (QLoRA) in boosting performance. Notably, HOLD-Z secured the 9th place in the prestigious HOLD-Telugu DravidianLangTech@EACL-2024 shared task, showcasing its potential for tackling the complexities of hate and offensive comment classification.