Chaitanya Kirti

2026

Vrittanta-AS: Dataset Development and Benchmarking for Event Trigger Detection and Classification in Assamese
Chaitanya Kirti | Dhrubajyoti Pathak | Ashish Anand | Prithwijit Guha
Proceedings of the Fifteenth Language Resources and Evaluation Conference

Event trigger detection and classification aim to identify and categorize events within unstructured text. While prior research has primarily focused on news or biomedical corpora, the literary domain, especially short stories, remains largely underexplored. This gap is particularly pronounced for low-resource languages such as Assamese, where limited annotated data and complex narrative structures hinder progress. To address this challenge, we introduce Vrittanta-AS, a manually curated Assamese event trigger detection and classification dataset comprising 13,171 annotated events extracted from short stories. The dataset is designed to advance research in information extraction and narrative understanding for low-resource Indian languages. We conduct a comprehensive evaluation using classical machine learning methods, neural sequential architectures, pre-trained transformer models, and large language models (LLMs) on the proposed dataset. Experimental results demonstrate that IndicBERT v2 achieves the highest performance for both event trigger detection (85.86% micro-F1) and classification (65.21% macro-F1). Vrittanta-AS serves as an important step toward developing benchmark resources for event trigger detection and classification in Assamese literary text.

pdf bib abs

Vrittanta-EN: A Benchmark Dataset for Event Trigger Detection and Classification Advancing Event Understanding in English Narrative Discourse
Chaitanya Kirti | Ashish Anand | Prithwijit Guha
Proceedings of the Fifteenth Language Resources and Evaluation Conference

Event trigger detection and classification involve identifying meaningful occurrences and categorizing them into predefined event types within narrative text. Despite extensive research on English event extraction in factual domains like news and biomedical text, narrative prose, such as short stories, has received comparatively little attention. To bridge this gap, Vrittanta-EN introduces a manually annotated English corpus comprising 11,272 event instances extracted from diverse short stories. The dataset captures a wide range of communicative, cognitive, and physical actions typical of narrative discourse. A comprehensive evaluation is conducted across a wide range of models, including classical machine learning baselines (SVM, Naive Bayes), neural sequential models (LSTM, BiLSTM, BiLSTM-CRF), encoder-only transformers (BERT, RoBERTa, ALBERT, DistilBERT, DeBERTa, ELECTRA), and encoder-decoder models (T5, BART), along with large language models (GPT-4.1, DeepSeek-V3.2-Exp, Claude Sonnet 4) under both zero-shot and five-shot settings. Experimental results show that ELECTRA achieved the highest overall performance for event trigger detection with an F1-score of 90.61%, while RoBERTa demonstrated superior performance for event classification with a macro F1 of 74.71%. These findings highlight the robustness of contextual transformer-based architectures for modeling narrative event structures in English short stories. The dataset, code, and annotation guidelines will be publicly released upon paper acceptance.

2025

pdf bib

AsRED: Development and Evaluation of an Assamese Reduplication Dataset
Pankaj Choudhury | Chaitanya Kirti | Dhrubajyoti Pathak | Sukumar Nandi
Proceedings of the 39th Pacific Asia Conference on Language, Information and Computation

2023

pdf bib abs

An Annotated Corpus for Realis Event Detection in Short Stories Written in English and Low Resource Assamese Language
Chaitanya Kirti | Pankaj Choudhury | Ashish Anand | Prithwijit Guha
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

This paper presents an annotated corpora of Assamese and English short stories for event trigger detection. This marks a pioneering endeavor in short stories, contributing to developing resources for this genre, especially in the low-resource Assamese language. In the process, 200 short stories were manually annotated in both Assamese and English. The dataset was evaluated and several models were compared for predicting events that are actually happening, i.e., realis events. However, it is expensive to develop manually annotated language resources, especially when the text requires specialist knowledge to interpret. In this regard, TagIT, an automated event annotation tool, is introduced. TagIT is designed to facilitate our objective of expanding the dataset from 200 to 1,000. The best-performing model was employed in TagIT to automate the event annotation process. Extensive experiments were conducted to evaluate the quality of the expanded dataset. This study further illustrates how the combination of an automatic annotation tool and human-in-the-loop participation significantly reduces the time needed to generate a high-quality dataset.

Co-authors

Venues

Fix author