Himanshu Dutta

2025

pdf bib abs
GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation
Himanshu Dutta | Sunny Manchanda | Prakhar Bapat | Meva Ram Gurjar | Pushpak Bhattacharyya
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Enterprises, public organizations, and localization providers increasingly rely on Document-level Machine Translation (DocMT) to process contracts, reports, manuals, and multimedia transcripts across languages. However, existing MT systems often struggle to handle discourse-level phenomena such as pronoun resolution, lexical cohesion, and ellipsis, resulting in inconsistent or incoherent translations. We propose **GRAFT**, a modular graph-based DocMT framework that leverages Large Language Model (LLM) agents to segment documents into discourse units, infer inter-discourse dependencies, extract structured memory, and generate context-aware translations. GRAFT transforms documents into directed acyclic graphs (DAGs) to explicitly model translation flow and discourse structure. Experiments across eight language directions and six domains show GRAFT outperforms commercial systems (e.g., Google Translate) and closed LLMs (e.g., GPT-4) by an average of 2.8 d-BLEU, and improves terminology consistency and discourse handling. GRAFT supports deployment with open-source LLMs (e.g., LLaMA, Qwen), making it cost-effective and privacy-preserving. These results position GRAFT as a robust solution for scalable, document-level translation in real-world applications.

pdf bib abs
Recon, Answer, Verify: Agents in Search of Truth
Satyam Shukla | Himanshu Dutta | Pushpak Bhattacharyya
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Human fact-checking is too slow to meet current demands, making automatic fact-checking system an essential alternative. Evaluating such systems is challenging as existing benchmark datasets either suffer from leakage or evidence incompleteness. This limits the realism of current evaluations. We present Politi-Fact-Only (PFO), a 5-class benchmark dataset of 2,982 political claims from politifact.com, where all post-claim analysis and annotator cues have been removed manually from evidence article. After filtration, evidence contains information available prior to the claim’s verification. By evaluating PFO, we see an average performance drop of 11.39% in terms of macro-f1 compared to PFO’s unfiltered version. Based on the identified challenges of the existing LLM-based fact-checking system, we propose RAV (Recon-Answer-Verify), an agentic framework with three agents, it iteratively generates and answers sub-questions to verify different aspects of the claim before finally generating the label. Unlike prior literature, we worked on reducing the follow-up question complexity by leveraging two 2 types of structured questions, which either validate a fact or inquire about a fact. RAV generalizes across both domains and label granularities, outperforming state-of-the-art methods by 57.5% on PFO (political, 5-class) and by 3.05% on the widely used HOVER dataset (encyclopedic, 2-class).

pdf bib
An introduction to computational identification and classification of Upamā alaṇkāra
Bhakti Jadhav | Himanshu Dutta | Shruti Kanitkar | Malhar Kulkarni | Pushpak Bhattacharyya
Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025

Co-authors

Malhar Kulkarni 1

Sunny Manchanda 1

Satyam Shukla 1

Venues

emnlp2
wsc1

Fix author