2025
pdf
bib
abs
“AGI” team at SHROOM-CAP: Data-Centric Approach to Multilingual Hallucination Detection using XLM-RoBERTa
Harsh Rathwa
|
Pruthwik Mishra
|
Shrikant Malviya
Proceedings of the 1st Workshop on Confabulation, Hallucinations and Overgeneration in Multilingual and Practical Settings (CHOMPS 2025)
The detection of hallucinations in multilingual scientific text generated by Large Language Models (LLMs) presents significant challenges for reliable AI systems. This paper describes our submission to the SHROOM-CAP 2025 shared task on scientific hallucination detection across 9 languages. Unlike most approaches that focus primarily on model architecture, we adopted a data-centric strategy that addressed the critical issue of training data scarcity and imbalance. We unify and balance five existing datasets to create a comprehensive training corpus of 124,821 samples (50% correct, 50% hallucinated), representing a 172x increase over the original SHROOM training data. Our approach fine-tuned XLM-RoBERTa-Large with 560 million parameters on this enhanced dataset, achieves competitive performance across all languages, including 2nd place in Gujarati (zero-shot language) with Factuality F1 of 0.5107, and rankings between 4th-6th place across the remaining 8 languages. Our results demonstrate that systematic data curation can significantly outperform architectural innovations alone, particularly for low-resource languages in zero-shot settings.
pdf
bib
abs
Team-SVNIT at JUST-NLP 2025: Domain-Adaptive Fine-Tuning of Multilingual Models for English–Hindi Legal Machine Translation
Rupesh Dhakad
|
Naveen Kumar
|
Shrikant Malviya
Proceedings of the 1st Workshop on NLP for Empowering Justice (JUST-NLP 2025)
Translating the sentences between English and Hindi is challenging, especially in the domain of legal documents. The major reason behind the complexity is specialized legal terminology, long and complex sentences, and the accuracy constraint. This paper presents a system developed by Team-SVNIT for the JUST-NLP 2025 shared task on legal machine translation. We fine-tune and compare multiple pretrained multilingual translation models, including the facebook/nllb-200-distilled-1.3B, on a corpus of 50,000 English–Hindi legal sentence pairs provided for the shared task. The training pipeline includes preprocessing, context windows of 512 tokens, and decoding methods to enhance translation quality. The proposed method secured 1st place on the official leaderboard with the AutoRank score of 61.62. We obtained the following scores on various metrics: BLEU 51.61, METEOR 75.80, TER 37.09, CHRF++ 73.29, BERTScore 92.61, and COMET 76.36. These results demonstrate that fine-tuning multilingual models for a domain-specific machine translation task enhances performance. It works better than general multilingual translation systems.
pdf
bib
abs
“Clutch or Cry” Team at TRACS @ WASP2025: A Hybrid Stacking Ensemble for Astrophysical Document Classification
Arshad Khatib
|
Aayush Prasad
|
Rudra Trivedi
|
Shrikant Malviya
Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications
Automatically identifying telescopes and their roles within astrophysical literature is crucial for large-scale scientific analysis and tracking instrument usage patterns. This paper describes the system developed by the “Clutch or Cry” team for the Telescope Reference and Astronomy Categorization Shared task (TRACS) at WASP 2025. The task involved two distinct challenges: multi-class telescope identification (Task 1) and multi-label role classification (Task 2). For Task 1, we employed a feature-centric approach combining document identifiers, metadata, and textual features to achieve high accuracy. For the more complex Task 2, we utilized a carefully designed two-level stacking ensemble. This hybrid model effectively fused symbolic information from a rule-based classifier with deep semantic understanding from a domain-adapted transformer. A subsequent meta-learning stage then performed targeted optimization for each role. These architectures were designed to address the primary challenges of handling long documents and managing severe class imbalance. A systematic optimization strategy focused on mitigating this imbalance significantly improved performance for minority classes. This work validates the effectiveness of using tailored, hybrid approaches and targeted optimization for complex classification tasks in specialized scientific domains.
2024
pdf
bib
abs
SK_DU Team: Cross-Encoder based Evidence Retrieval and Question Generation with Improved Prompt for the AVeriTeC Shared Task
Shrikant Malviya
|
Stamos Katsigiannis
Proceedings of the Seventh Fact Extraction and VERification Workshop (FEVER)
As part of the AVeriTeC shared task, we developed a pipelined system comprising robust and finely tuned models. Our system integrates advanced techniques for evidence retrieval and question generation, leveraging cross-encoders and large language models (LLMs) for optimal performance. With multi-stage processing, the pipeline demonstrates improvements over baseline models, particularly in handling complex claims that require nuanced reasoning by improved evidence extraction, question generation and veracity prediction. Through detailed experiments and ablation studies, we provide insights into the strengths and weaknesses of our approach, highlighting the critical role of evidence sufficiency and context dependency in automated fact-checking systems. Our system secured a competitive rank, 7th on the development and 12th on the test data, in the shared task, underscoring the effectiveness of our methods in addressing the challenges of real-world claim verification.
pdf
bib
abs
Evidence Retrieval for Fact Verification using Multi-stage Reranking
Shrikant Malviya
|
Stamos Katsigiannis
Findings of the Association for Computational Linguistics: EMNLP 2024
In the fact verification domain, the accuracy and efficiency of evidence retrieval are paramount. This paper presents a novel approach to enhance the fact verification process through a Multi-stage ReRanking (M-ReRank) paradigm, which addresses the inherent limitations of single-stage evidence extraction. Our methodology leverages the strengths of advanced reranking techniques, including dense retrieval models and list-aware rerankers, to optimise the retrieval and ranking of evidence of both structured and unstructured types. We demonstrate that our approach significantly outperforms previous state-of-the-art models, achieving a recall rate of 93.63% for Wikipedia pages. The proposed system not only improves the retrieval of relevant sentences and table cells but also enhances the overall verification accuracy. Through extensive experimentation on the FEVEROUS dataset, we show that our M-ReRank pipeline achieves substantial improvements in evidence extraction, particularly increasing the recall of sentences by 7.85%, tables by 8.29% and cells by 3% compared to the current state-of-the-art on the development set.
2021
pdf
bib
abs
Design and Development of Spoken Dialogue System in Indic Languages
Shrikant Malviya
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Based on the modular architecture of a task-oriented Spoken Dialogue System (SDS), the presented work focussed on constructing all the system components as statistical models with parameters learned directly from the data by resolving various language-specific and language-independent challenges. In order to understand the research questions that underlie the SLU and DST module in the perspective of Indic languages (Hindi), we collect a dialogue corpus: Hindi Dialogue Restaurant Search (HDRS) corpus and compare various state-of-the-art SLU and DST models on it. For the dialogue manager (DM), we investigate the deep-learning reinforcement learning (RL) methods, e.g. actor-critic algorithms with experience replay. Next, for the dialogue generation, we incorporated Recurrent Neural Network Language Generation (RNNLG) framework based models. For speech synthesisers as a last component in the dialogue pipeline, we not only train several TTS systems but also propose a quality assessment framework to evaluate them.
2017
pdf
bib
Sentiment Analysis: An Empirical Comparative Study of Various Machine Learning Approaches
Swapnil Jain
|
Shrikant Malviya
|
Rohit Mishra
|
Uma Shanker Tiwary
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)