Delip Rao


2025

pdf bib
Probabilistic Soundness Guarantees in LLM Reasoning Chains
Weiqiu You | Anton Xue | Shreya Havaldar | Delip Rao | Helen Jin | Chris Callison-Burch | Eric Wong
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

In reasoning chains generated by large language models (LLMs), initial errors often propagate and undermine the reliability of the final conclusion. Current LLM-based error detection methods often fail to detect propagated errors because earlier errors can corrupt judgments of downstream reasoning. To better detect such errors, we introduce Autoregressive Reasoning Entailment Stability (ARES), a probabilistic framework that evaluates each reasoning step based solely on previously-verified premises. This inductive method yields a nuanced score for each step and provides certified statistical guarantees of its soundness, rather than a brittle binary label. ARES achieves state-of-the-art performance across four benchmarks (72.1% Macro-F1, +8.2 points) and demonstrates superior robustness on very long synthetic reasoning chains, where it excels at detecting propagated errors (90.3% F1, +27.6 points).

pdf bib
NSF-SciFy: Mining the NSF Awards Database for Scientific Claims
Delip Rao | Weiqiu You | Eric Wong | Chris Callison-Burch
Proceedings of The 5th New Frontiers in Summarization Workshop

We introduce NSF-SciFy, a comprehensive dataset of scientific claims and investigation proposals extracted from National Science Foundation award abstracts. While previous scientific claim verification datasets have been limited in size and scope, NSF-SciFy represents a significant advance with an estimated 2.8 million claims from 400,000 abstracts spanning all science and mathematics disciplines. We present two focused subsets: NSF-SciFy-MatSci with 114,000 claims from materials science awards, and NSF-SciFy-20K with 135,000 claims across five NSF directorates. Using zero-shot prompting, we develop a scalable approach for joint extraction of scientific claims and investigation proposals. We demonstrate the dataset’s utility through three downstream tasks: non-technical abstract generation, claim extraction, and investigation proposal extraction. Fine-tuning language models on our dataset yields substantial improvements, with relative gains often exceeding 100%, particularly for claim and proposal extraction tasks. Our error analysis reveals that extracted claims exhibit high precision but lower recall, suggesting opportunities for further methodological refinement. NSF-SciFy enables new research directions in large-scale claim verification, scientific discovery tracking, and meta-scientific analysis.

2023

pdf bib
Learning Interpretable Style Embeddings via Prompting LLMs
Ajay Patel | Delip Rao | Ansh Kothary | Kathleen McKeown | Chris Callison-Burch
Findings of the Association for Computational Linguistics: EMNLP 2023

Style representation learning builds content-independent representations of author style in text. To date, no large dataset of texts with stylometric annotations on a wide range of style dimensions has been compiled, perhaps because the linguistic expertise to perform such annotation would be prohibitively expensive. Therefore, current style representation approaches make use of unsupervised neural methods to disentangle style from content to create style vectors. These approaches, however, result in uninterpretable representations, complicating their usage in downstream applications like authorship attribution where auditing and explainability is critical. In this work, we use prompting to perform stylometry on a large number of texts to generate a synthetic stylometry dataset. We use this synthetic data to then train human-interpretable style representations we call LISA embeddings. We release our synthetic dataset (StyleGenome) and our interpretable style embedding model (LISA) as resources.

pdf bib
Faithful Chain-of-Thought Reasoning
Qing Lyu | Shreya Havaldar | Adam Stein | Li Zhang | Delip Rao | Eric Wong | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf bib
Typed Graph Models for Learning Latent Attributes from Names
Delip Rao | David Yarowsky
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Entity Disambiguation for Knowledge Base Population
Mark Dredze | Paul McNamee | Delip Rao | Adam Gerber | Tim Finin
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Streaming Cross Document Entity Coreference Resolution
Delip Rao | Paul McNamee | Mark Dredze
Coling 2010: Posters

2009

pdf bib
Semi-Supervised Polarity Lexicon Induction
Delip Rao | Deepak Ravichandran
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib
Ranking and Semi-supervised Classification on Large Scale Graphs Using Map-Reduce
Delip Rao | David Yarowsky
Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing (TextGraphs-4)

pdf bib
𝜖-extension Hidden Markov Models and Weighted Transducers for Machine Transliteration
Balakrishnan Varadarajan | Delip Rao
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

2008

pdf bib
Affinity Measures Based on the Graph Laplacian
Delip Rao | David Yarowsky | Chris Callison-Burch
Coling 2008: Proceedings of the 3rd Textgraphs workshop on Graph-based Algorithms for Natural Language Processing

2007

pdf bib
JHU1 : An Unsupervised Approach to Person Name Disambiguation using Web Snippets
Delip Rao | Nikesh Garera | David Yarowsky
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)