Pankaj Choudhury


2023

pdf
Image Caption Synthesis for Low Resource Assamese Language using Bi-LSTM with Bilinear Attention
Pankaj Choudhury | Prithwijit Guha | Sukumar Nandi
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

pdf
An Annotated Corpus for Realis Event Detection in Short Stories Written in English and Low Resource Assamese Language
Chaitanya Kirti | Pankaj Choudhury | Ashish An | Prithwijit Guha
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

This paper presents an annotated corpora of Assamese and English short stories for event trigger detection. This marks a pioneering endeavor in short stories, contributing to developing resources for this genre, especially in the low-resource Assamese language. In the process, 200 short stories were manually annotated in both Assamese and English. The dataset was evaluated and several models were compared for predicting events that are actually happening, i.e., realis events. However, it is expensive to develop manually annotated language resources, especially when the text requires specialist knowledge to interpret. In this regard, TagIT, an automated event annotation tool, is introduced. TagIT is designed to facilitate our objective of expanding the dataset from 200 to 1,000. The best-performing model was employed in TagIT to automate the event annotation process. Extensive experiments were conducted to evaluate the quality of the expanded dataset. This study further illustrates how the combination of an automatic annotation tool and human-in-the-loop participation significantly reduces the time needed to generate a high-quality dataset.