Sukhada Sukhada
Also published as: Sukhada
2025
Indian Grammatical Tradition-Inspired Universal Semantic Representation Bank (USR Bank 1.0)
Soma Paul | Sukhada Sukhada | Bidisha Bhattacharjee | Kumari Riya | Sashank Tatavolu | Kamesh R | Isma Anwar | Pratibha Rani
Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025)
Soma Paul | Sukhada Sukhada | Bidisha Bhattacharjee | Kumari Riya | Sashank Tatavolu | Kamesh R | Isma Anwar | Pratibha Rani
Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025)
In this paper, we introduce USR Bank 1.0, a multi-layered, text-level semantic representation framework designed to capture not only the predicate-argument structure of an utterance but also the speaker’s communicative intent as expressed linguistically. Built on the Universal Semantic Grammar (USG), which is grounded in Pāṇinian grammar and the Indian Grammatical Tradition (IGT), USR systematically encodes semantic, morpho-syntactic, discourse, and pragmatic information across distinct layers. In the USR generation process, initial USRs are automatically generated using a dedicated USR-builder tool and subsequently validated via a web-based interface (SAVI), ensuring high inter-annotator agreement and semantic fidelity. Our evaluation on Hindi texts demonstrates robust dependency and discourse annotation consistency and strong semantic similarity in USR-to-text generation. By distributing semantic-pragmatic information across layers and capturing the speaker’s perspective, USR provides a cognitively motivated, language-agnostic framework with promising applications in multilingual natural language processing.
2023
Evaluation of Universal Semantic Representation (USR)
Kirti Garg | Soma Paul | Sukhada | Riya Kumari | Fatema Bawahir
Proceedings of the Fourth International Workshop on Designing Meaning Representations
Kirti Garg | Soma Paul | Sukhada | Riya Kumari | Fatema Bawahir
Proceedings of the Fourth International Workshop on Designing Meaning Representations
Universal Semantic Representation (USR) is designed as a language-independent information packaging system that captures information at three levels: (a) Lexico-conceptual, (b) Syntactico-Semantic, and (c) Discourse. Unlike other representations that mainly encode predicates and their argument structures, our proposed representation captures the speaker’s vivakṣā- how the speaker views the activity. The idea of “speaker’s vivakṣā is inspired by Indian Grammatical Tradition. There can be some amount of idiosyncrasy of the speaker in the annotation since it is the speaker’s view- point that has been captured in the annotation. Hence the evaluation metrics of such resources need to be also thought through from scratch. This paper presents an extensive evaluation procedure of this semantic representation from two perspectives (a) Inter- Annotator Agreement and (b) one downstream task, namely multilingual Natural Language Generation. We also qualitatively evaluate the experience of natural language generation by manual parsing of USR, so as to understand the readability of USR. We have achieved above 80% Inter-Annotator Agreement for USR annotations and above 80% semantic closeness in multi-lingual generation tasks suggesting the reliability of USR annotations and utility for multi-lingual generations. The qualitative evaluation also suggests high readability and hence the utility of USR as a semantic representation.
2021
Semantics of Spatio-Directional Geometric Terms of Indian Languages
Sukhada | Soma Paul | Rahul Kumar | Karthik Puranik
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Sukhada | Soma Paul | Rahul Kumar | Karthik Puranik
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
This paper examines widely prevalent yet little-studied expressions in Indian languages which are known as geometrical terms be-cause “they engage locations along the axes of the reference object”. These terms are andara (inside), b ̄ahara (outside), ̄age (in front of), s ̄amane (in front of), p ̄ıche (back), ̄upara (above/over), n ̄ıce (under/below), d ̄ayem. (right), b ̄ayem. (left), p ̄asa (near), d ̄ura (away/far) in Hindi. The way these terms have been interpreted by the scholars of the Hindi language and handled in the Hindi Dependency treebank is misleading. This paper proposes an alternative analysis of these terms focusing on their triple – nominal, modifier and relational - functions and presents abstract semantic representations of these terms following the proposed analysis. The semantic representation will be explicit, unambiguous abstract and therefore universal in nature. The correspondence of these terms in Bangla and Kannada are also identified. Disambiguation of geometric terms will facilitate parsing and machine translation especially from Indian Language to English because these geometric terms of Indian languages are variedly translated in English de-pending on context.
2020
Parsing Indian English News Headlines
Samapika Roy | Sukhada | Anil Kr. Singh
Proceedings of the 17th International Conference on Natural Language Processing (ICON)
Samapika Roy | Sukhada | Anil Kr. Singh
Proceedings of the 17th International Conference on Natural Language Processing (ICON)
Parsing news Headlines is one of the difficult tasks of Natural Language Processing. It is mostly because news Headlines (NHs) are not complete grammatical sentences. News editors use all sorts of tricks to grab readers’ attention, for instance, unusual capitalization as in the headline’ Ear SHOT ashok rajagopalan’; some are world knowledge demanding like ‘Church reformation celebrated’ where the ‘Church reformation’ refers to a historical event and not a piece of news about an ordinary church. The lack of transparency in NHs can be linguistic, cultural, social, or contextual. The lack of space provided for a news headline has led to creative liberty. Though many works like news value extraction, summary generation, emotion classification of NHs have been going on, parsing them had been a tough challenge. Linguists have also been interested in NHs for creativity in the language used by bending traditional grammar rules. Researchers have conducted studies on news reportage, discourse analysis of NHs, and many more. While the creativity seen in NHs is fascinating for language researchers, it poses a computational challenge for Natural Language Processing researchers. This paper presents an outline of the ongoing doctoral research on the parsing of Indian English NHs. The ultimate aim of this research is to provide a module that will generate correctly parsed NHs. The intention is to enhance the broad applicability of newspaper corpus for future Natural Language Processing applications.
2015
Applying Sanskrit Concepts for Reordering in MT
Akshar Bharati | Sukhada | Prajna Jha | Soma Paul | Dipti M Sharma
Proceedings of the 12th International Conference on Natural Language Processing
Akshar Bharati | Sukhada | Prajna Jha | Soma Paul | Dipti M Sharma
Proceedings of the 12th International Conference on Natural Language Processing