Lakshmojee Koduru
2026
Benchmarking Models for Low-Resource Nepali Event Extraction with Trigger Phrase Identification and Event Classification
Sujal Maharjan | Astha Shrestha | Lakshmojee Koduru | Sweta Poudel | Shuvam Shiwakoti | Rabin Thapa | Kritesh Rauniyar | Surendrabikram Thapa
Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026)
Sujal Maharjan | Astha Shrestha | Lakshmojee Koduru | Sweta Poudel | Shuvam Shiwakoti | Rabin Thapa | Kritesh Rauniyar | Surendrabikram Thapa
Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026)
Research on Event Extraction (EE) in South Asian languages is crucial for understanding information dissemination and enabling automated news analysis in morphologically complex, low-resource environments. To address the scarcity of high-quality, publicly available datasets, we present Nepali Event Extraction (NepEE), a manually annotated corpus comprising 10,226 Devanagari sentences. The dataset includes annotations for trigger spans and event types, achieving high inter-annotator agreement with Fleiss’ kappa = 0.812 for trigger identification and kappa = 0.855 for event classification. Our dataset was developed through a rigorous iterative three-phase protocol involving five expert native speakers to ensure linguistic precision. We conduct benchmarking across a broad spectrum of approaches, including classical feature-based models, five fine-tuned Transformer encoders, and contemporary instruction-tuned Large Language Models (LLMs) using zero-shot and fixed few-shot prompting. Our analysis shows that Indic-specialized Transformers achieve superior classification performance, while traditional methods and few-shot prompting struggle with the challenges of exact span extraction in morphologically complex contexts. Furthermore, we quantify performance differences between sentence-level and span-level tasks, providing strong baselines for future research. The findings and the released NepEE dataset provide a valuable resource for advancing event understanding in low-resource languages (LRLs). All code and resources are available at https://github.com/SUJAL390/EEUCA-ACL-2026-Trigger-Phrase-Identification-and-Event-Classification-in-Low-Resource-Languages.