Virendra Singh

2026

Revisiting Evaluation of Question Answering Systems in Low-Resource Indic Languages: Bridging Human and Metric Alignment
Anuj Kumar | Satyadev Ahlawat | Yamuna Prasad | Virendra Singh
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Evaluating Question Answering (QA) systems in low-resource Indic languages remains challenging due to the scarcity of annotated data, high linguistic diversity, and the absence of reliable evaluation metrics. Many Indian languages are severely underrepresented, making it difficult to accurately assess the performance of Large Language Models (LLMs) on QA tasks. Commonly used metrics like BLEU, ROUGE-L, and BERTScore, while successful in machine translation and resource-rich scenarios, tend to perform poorly in low-resource QA settings. These metrics often exhibit issues such as compressed scoring ranges, excessive zero scores, and weak alignment with human judgments. To overcome these limitations, this work introduces the LRM²QAS (Language Robust Multi-aspect Metrics for Question Answering Systems). This composite evaluation framework integrates semantic similarity, factual completeness, numerical accuracy, and contextual relevance. The proposed metric is evaluated across eight Indic-language QA tasks using multiple LLMs, as well as on open-domain benchmarks NaturalQuestions (NQ) and TriviaQA (TQ). Across all settings, LRM²QAS demonstrates stronger agreement with human evaluation, as measured by Pearson, Spearman, and Kendall correlation coefficients. Experimental findings highlight that LRM²QAS provides more precise distinctions between model outputs and aligns more closely with human judgment, offering a reliable framework for evaluating multilingual QA in low-resource Indic languages.

2025

pdf bib abs

LRMGS: A Language-Robust Metric for Evaluating Question Answering in Very Low-Resource Indic Languages
Anuj Kumar | Satyadev Ahlawat | Yamuna Prasad | Virendra Singh
The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Reliable evaluation of Question Answering (QA) systems in low-resource Indic languages presents a significant challenge due to limited annotated datasets, linguistic diversity, and suitable evaluation metrics. Languages such as Sindhi, Manipuri, Dogri, Konkani, and Maithili are particularly underrepresented, creating difficulty in assessing Large Language Models (LLMs) on QA tasks. Existing metrics, including BLEU, ROUGE-L, and BERTScore, are effective in machine translation and high-resource settings; however, they often fail in low-resource QA due to score compression, zero-inflation, and poor scale alignment. To overcome this, LRMGS (Language-Robust Metric for Generative QA) is introduced to capture semantic and lexical agreement while preserving the score scale across languages. LRMGS is evaluated across 8 Indic languages and multiple LLMs, demonstrating consistently higher concordance with reference-based chrF++ scores, measured using the Concordance Correlation Coefficient (CCC). Experimental results indicate that LRMGS provides more accurate discrimination of system performance in very low-resource languages compared to existing metrics. This work establishes a robust and interpretable framework for evaluating QA systems in low-resource Indic languages, supporting more reliable multilingual model assessment.

pdf bib abs

Video-guided Machine Translation: A Survey of Models, Datasets, and Challenges
Pinaki Das | Virendra Singh | Pushpak Bhattacharyya | Gholamreza Haffari
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

In recent years, machine translation has evolved with the integration of multimodal information. Infusion of multi-modality into translation tasks decreases ambiguation and enhances translation scores. Common modalities include images, speech, and videos, which provide additional context alongside the text to be translated. While multimodal translation with images has been extensively studied, video-guided machine translation (VMT) has gained increasing attention, particularly since Wang et al. 2019 first explored this task. In this paper, we provide a comprehensive overview of VMT, highlighting its unique challenges, methodologies, and recent advancements. Unlike previous surveys that primarily focus on image-guided multimodal machine translation, this work explores the distinct complexities and opportunities introduced by adding video as a modality to the translation task.

Co-authors

Gholamreza Haffari 1

Venues

Fix author