Pradyot Prakash

2025

Factuality evaluation aims to detect factual errors produced by language models (LMs) and hence guide the development of more factual models. Towards this goal, we train a factuality evaluator, FenCE, that provides LM generators with claim-level factuality feedback. In particular, we train FenCE to (1) generate textual critiques along with scores and (2) make claim-level judgment based on diverse source documents obtained by various tools, via data augmentation on a combination of public judgment datasets. We then present a framework that leverages FenCE to improve the factuality of LM generators by constructing training data. Specifically, we generate a set of candidate responses, ask FenCE to revise and score each response without introducing lesser-known facts, and train the generator by preferring highly scored revised responses. Experiments show that our data augmentation methods improve the evaluator’s accuracy by 2.9% on LLM-AggreFact. With FenCE, we improve Llama2-7B-chat/Llama3-8B-chat’s factuality rate by 16.86%/14.45% on FActScore, outperforming state-of-the-art factuality finetuning methods by 8.83%/6.96%.

pdf bib abs
Dynamic Strategy Planning for Efficient Question Answering with Large Language Models
Tanmay Parekh | Pradyot Prakash | Alexander Radovic | Akshay Shekher | Denis Savenkov
Findings of the Association for Computational Linguistics: NAACL 2025

Research has shown an effectiveness of reasoning (e.g. Chain-of-Thought), planning (e.g. SelfAsk) and retrieval augmented generation strategies to improve performance of Large Language Models (LLMs) on various tasks, such as question answering. However, using a single fixed strategy for answering all different kinds of questions is sub-optimal in performance and inefficient in terms of generated tokens and retrievals. In our work, we propose a novel technique, DyPlan, to induce a dynamic strategy selection process in LLMs for cost-effective question-answering. DyPlan incorporates an initial decision step to select the most suitable strategy conditioned on the input question and guides the LLM’s response generation accordingly. We extend DyPlan to DyPlan-verify, adding an internal verification and correction process to further enrich the generated answer. Experimentation on three prominent multi-hop question answering (MHQA) datasets reveals how DyPlan can improve model performance by 7-13% while reducing the cost by 11-32% relative to the best baseline model.

2017

pdf bib abs
Utilizing Lexical Similarity between Related, Low-resource Languages for Pivot-based SMT
Anoop Kunchukuttan | Maulik Shah | Pradyot Prakash | Pushpak Bhattacharyya
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

We investigate pivot-based translation between related languages in a low resource, phrase-based SMT setting. We show that a subword-level pivot-based SMT model using a related pivot language is substantially better than word and morpheme-level pivot models. It is also highly competitive with the best direct translation model, which is encouraging as no direct source-target training corpus is used. We also show that combining multiple related language pivot models can rival a direct translation model. Thus, the use of subwords as translation units coupled with multiple related pivot languages can compensate for the lack of a direct parallel corpus.