Aryan Chandramania


2025

pdf bib
SHROOM-CAP: Shared Task on Hallucinations and Related Observable Overgeneration Mistakes in Crosslingual Analyses of Publications
Aman Sinha | Federica Gamba | Raúl Vázquez | Timothee Mickus | Ahana Chattopadhyay | Laura Zanella | Binesh Arakkal Remesh | Yash Kankanampati | Aryan Chandramania | Rohit Agarwal
Proceedings of the 1st Workshop on Confabulation, Hallucinations and Overgeneration in Multilingual and Practical Settings (CHOMPS 2025)

This paper presents an overview of the SHROOM-CAP Shared Task, which focuses on detecting hallucinations and over-generation errors in cross-lingual analyses of scientific publications. SHROOM-CAP covers nine languages: five high-resource (English, French, Hindi, Italian, and Spanish) and four low-resource (Bengali, Gujarati, Malayalam, and Telugu). The task frames hallucination detection as a binary classification problem, where participants must predict whether a given text contains factual inaccuracies and fluency mistakes. We received 1,571 submissions from 5 participating teams during the test phase over the nine languages. In the paper, we present an analysis of the evaluated systems to assess their performance on the hallucination detection task across languages. Our findings reveal a disparity in system performance between high-resource and low-resource languages. Furthermore, we observe that factuality and fluency tend to be closely aligned in high-resource languages, whereas this correlation is less evident in low-resource languages. Overall, SHROOM-CAP underlines that hallucination detection remains a challenging open problem, particularly in low-resource and domain-specific settings.

2024

pdf bib
Weighted Layer Averaging RoBERTa for Black-Box Machine-Generated Text Detection
Ayan Datta | Aryan Chandramania | Radhika Mamidi
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

We propose a novel approach for machine-generated text detection using a RoBERTa model with weighted layer averaging and AdaLoRA for parameter-efficient fine-tuning. Our method incorporates information from all model layers, capturing diverse linguistic cues beyond those accessible from the final layer alone. To mitigate potential overfitting and improve generalizability, we leverage AdaLoRA, which injects trainable low-rank matrices into each Transformer layer, significantly reducing the number of trainable parameters. Furthermore, we employ data mixing to ensure our model encounters text from various domains and generators during training, enhancing its ability to generalize to unseen data. This work highlights the potential of combining layer-wise information with parameter-efficient fine-tuning and data mixing for effective machine-generated text detection.