Devika Tiwari

2025

pdf bib abs
Legal-CGEL: Analyzing Legal Text in the CGELBank Framework
Brandon Waldon | Micaela Wells | Devika Tiwari | Meru Gopalan | Nathan Schneider
Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025)

We introduce Legal-CGEL, an ongoing treebanking project focused on syntactic analysis of legal English text in the CGELBank framework (Reynolds et al., 2022), with an initial focus on US statutory law. When it comes to treebanking for legal English, we argue that there are unique advantages to employing CGELBank, a formalism that extends a comprehensive—and authoritative—formal description of English syntax (the Cambridge Grammar of the English Language; Huddleston & Pullum, 2002). We discuss some analytical challenges that have arisen in extending CGELBank to the legal domain. We conclude with a summary of immediate and longer-term project goals.

2024

Work on shallow discourse parsing in English has focused on the Wall Street Journal corpus, the only large-scale dataset for the language in the PDTB framework. However, the data is not openly available, is restricted to the news domain, and is by now 35 years old. In this paper, we present and evaluate a new open-access, multi-genre benchmark for PDTB-style shallow discourse parsing, based on the existing UD English GUM corpus, for which discourse relation annotations in other frameworks already exist. In a series of experiments on cross-domain relation classification, we show that while our dataset is compatible with PDTB, substantial out-of-domain degradation is observed, which can be alleviated by joint training on both datasets.

2022

pdf bib abs
An Exploration of Linguistically-Driven and Transfer Learning Methods for Euphemism Detection
Devika Tiwari | Natalie Parde
Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)

Euphemisms are often used to drive rhetoric, but their automated recognition and interpretation are under-explored. We investigate four methods for detecting euphemisms in sentences containing potentially euphemistic terms. The first three linguistically-motivated methods rest on an understanding of (1) euphemism’s role to attenuate the harsh connotations of a taboo topic and (2) euphemism’s metaphorical underpinnings. In contrast, the fourth method follows recent innovations in other tasks and employs transfer learning from a general-domain pre-trained language model. While the latter method ultimately (and perhaps surprisingly) performed best (F1 = 0.74), we comprehensively evaluate all four methods to derive additional useful insights from the negative results.

Co-authors

Venues

Fix author