Abhishek Purushothama

2026

Syntax as a Rosetta Stone: Universal Dependencies for In-Context Coptic Translation
Abhishek Purushothama | Emma Thronson | Alexia Guo | Amir Zeldes
Findings of the Association for Computational Linguistics: ACL 2026

This paper proposes a novel in-context learning approach to support low resource machine translation for the Coptic language, using prompts based on Universal Dependencies parses of input sentences. Building on existing work using bilingual dictionaries to support inference for vocabulary items, we add several representations of syntactic analyses to our inputs, specifically exploring the inclusion of raw parser outputs, verbalizations of parses in plain English, and explanations of specific difficult constructions identified in input subgraphs and how they can be translated. Our results show that while syntactic information alone is not as useful as dictionary-based glosses, combining retrieved dictionary items with syntactic information achieves significant gains across model sizes, achieving new state-of-the-art results for the language.

pdf bib abs

Sense and Sensitivity: “Reasoning” Models are More Robust, but can Diverge from Human Consensus in a Legal Interpretation Task
Dawson Petersen | Abhishek Purushothama | Nathan Schneider
Proceedings of the 30th Conference on Computational Natural Language Learning

Can LLMs make metalinguistic judgments? While LLM embeddings are often regarded as high-quality semantic representations, it is not clear that prompting an LLM is a useful way to obtain metalinguistic insights (e.g., whether a DIY gun kit is a “firearm”). While some prior work has suggested LLM prompting can simulate surveys with human participants, computational studies in the domain of legal interpretation have found that LLMs are unreliable for metalinguistic judgments due to prompt sensitivity. However, these studies did not directly compare humans and LLMs on identical tasks, nor did they test so-called “reasoning” models. The current study addresses these gaps by directly comparing the robustness of human and LLM judgments (with and without reasoning) in an English-language legal interpretation task. Our results show that LLMs were more sensitive to irrelevant prompt features compared to human participants. Enabling reasoning improved the stability of LLM responses. However, even reasoning model outputs had only moderate correlations with human judgments, and all models sometimes output interpretations that no humans reached in response to the same prompt. We conclude that while reasoning decreases prompt sensitivity, LLMs are still poor proxies for human metalinguistic judgments.

2025

pdf bib abs

Not ready for the bench: LLM legal interpretation is unstable and uncalibrated to human judgments
Abhishek Purushothama | Junghyun Min | Brandon Waldon | Nathan Schneider
Proceedings of the Natural Legal Language Processing Workshop 2025

Legal interpretation frequently involves assessing how a legal text, as understood by an ‘ordinary’ speaker of the language, applies to the set of facts characterizing a legal dispute. Recent scholarship has proposed that legal practitioners add large language models (LLMs) to their interpretive toolkit. This work offers an empirical argument against LLM-assisted interpretation as recently practiced by legal scholars and federal judges. Our investigation in English shows that models do not provide stable interpretive judgments and are susceptible to subtle variations in the prompt. While instruction tuning slightly improves model calibration to human judgments, even the best-calibrated LLMs remain weak predictors of human native speakers’ judgments.

pdf bib abs

DeDisCo at the DISRPT 2025 Shared Task: A System for Discourse Relation Classification
Zhuoxuan Ju | Jingni Wu | Abhishek Purushothama | Amir Zeldes
Proceedings of the 4th Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2025)

This paper presents DeDisCo, Georgetown University’s entry in the DISRPT 2025 shared task on discourse relation classification. We test two approaches, using an mt5-based encoder and a decoder based approach using the openly available Qwen model. We also experiment on training with augmented dataset for low-resource languages using matched data translated automatically from English, as well as using some additional linguistic features inspired by entries in previous editions of the Shared Task. Our system achieves a macro-accuracy score of 71.28, and we provide some interpretation and error analysis for our results.

2024

pdf bib abs

Getting The Most Out of Your Training Data: Exploring Unsupervised Tasks for Morphological Inflection
Abhishek Purushothama | Adam Wiemerslage | Katharina Von Der Wense
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Pre-trained transformers such as BERT have been shown to be effective in many natural language tasks. However, they are under-explored for character-level sequence to sequence tasks. In this work, we investigate pre-training transformers for the character-level task of morphological inflection in several languages. We compare various training setups and secondary tasks where unsupervised data taken directly from the target task is used. We show that training on secondary unsupervised tasks increases inflection performance even without any external data, suggesting that models learn from additional unsupervised tasks themselves—not just from additional data. We also find that this does not hold true for specific combinations of secondary task and training setup, which has interesting implications for denoising objectives in character-level tasks.

Co-authors

Katharina von der Wense 1

Venues

NLLP1

Fix author