Kanishka Jain


2025

pdf bib
A Benchmark for Hindi Verb-Argument Structure Alternations
Kanishka Jain | Ashwini Vaidya
Findings of the Association for Computational Linguistics: EMNLP 2025

In this paper we introduce a Hindi verb alternations benchmark to investigate whether pretrained large language models (LLMs) can infer the frame-selectional properties of Hindi verbs. Our benchmark consists of minimal pairs such as ‘Tina cut the wood’/*‘Tina disappeared the wood’. We create four variants of these alternations for Hindi to test knowledge of verbal morphology and argument case-marking. Our results show that a masked monolingual model performs the best, while causal models fare poorly. We further test the quality of the predictions using a cloze-style sentence completion task. While the models appear to infer the right mapping between verbal morphology and valency in the acceptability task, they do not generate the right verbal morphology in the cloze task. The model completions also lack pragmatic and world knowledge, crucial for making generalizations about verbal alternations. Our work points towards the need for more cross-linguistic research of verbal alternations.

2024

pdf bib
Revisiting VMWEs in Hindi: Annotating Layers of Predication
Kanishka Jain | Ashwini Vaidya
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024

Multiword expressions in languages like Hindi are both productive and challenging. Hindi not only uses a variety of verbal multiword expressions (VMWEs) but also employs different combinatorial strategies to create new types of multiword expressions. In this paper we are investigating two such strategies that are quite common in the language. Firstly, we describe that VMWEs in Hindi are not just lexical but also morphological. Causatives are formed morphologically in Hindi. Second, we examine Stacked VMWEs i.e. when at least two VMWEs occur together. We suggest that the existing PARSEME annotation framework can be extended to these two phenomena without changing the existing guidelines. We also propose rule-based heuristics using existing Universal Dependency annotations to automatically identify and annotate some of the VMWEs in the language. The goal of this paper is to refine the existing PARSEME corpus of Hindi for VMWEs while expanding its scope giving a more comprehensive picture of VMWEs in Hindi.