Soma Paul


2025

pdf bib
Indian Grammatical Tradition-Inspired Universal Semantic Representation Bank (USR Bank 1.0)
Soma Paul | Sukhada Sukhada | Bidisha Bhattacharjee | Kumari Riya | Sashank Tatavolu | Kamesh R | Isma Anwar | Pratibha Rani
Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025)

In this paper, we introduce USR Bank 1.0, a multi-layered, text-level semantic representation framework designed to capture not only the predicate-argument structure of an utterance but also the speaker’s communicative intent as expressed linguistically. Built on the Universal Semantic Grammar (USG), which is grounded in Pāṇinian grammar and the Indian Grammatical Tradition (IGT), USR systematically encodes semantic, morpho-syntactic, discourse, and pragmatic information across distinct layers. In the USR generation process, initial USRs are automatically generated using a dedicated USR-builder tool and subsequently validated via a web-based interface (SAVI), ensuring high inter-annotator agreement and semantic fidelity. Our evaluation on Hindi texts demonstrates robust dependency and discourse annotation consistency and strong semantic similarity in USR-to-text generation. By distributing semantic-pragmatic information across layers and capturing the speaker’s perspective, USR provides a cognitively motivated, language-agnostic framework with promising applications in multilingual natural language processing.

2023

pdf bib
Evaluation of Universal Semantic Representation (USR)
Kirti Garg | Soma Paul | Sukhada | Riya Kumari | Fatema Bawahir
Proceedings of the Fourth International Workshop on Designing Meaning Representations

Universal Semantic Representation (USR) is designed as a language-independent information packaging system that captures information at three levels: (a) Lexico-conceptual, (b) Syntactico-Semantic, and (c) Discourse. Unlike other representations that mainly encode predicates and their argument structures, our proposed representation captures the speaker’s vivakṣā- how the speaker views the activity. The idea of “speaker’s vivakṣā is inspired by Indian Grammatical Tradition. There can be some amount of idiosyncrasy of the speaker in the annotation since it is the speaker’s view- point that has been captured in the annotation. Hence the evaluation metrics of such resources need to be also thought through from scratch. This paper presents an extensive evaluation procedure of this semantic representation from two perspectives (a) Inter- Annotator Agreement and (b) one downstream task, namely multilingual Natural Language Generation. We also qualitatively evaluate the experience of natural language generation by manual parsing of USR, so as to understand the readability of USR. We have achieved above 80% Inter-Annotator Agreement for USR annotations and above 80% semantic closeness in multi-lingual generation tasks suggesting the reliability of USR annotations and utility for multi-lingual generations. The qualitative evaluation also suggests high readability and hence the utility of USR as a semantic representation.

2021

pdf bib
Semantics of Spatio-Directional Geometric Terms of Indian Languages
Sukhada | Soma Paul | Rahul Kumar | Karthik Puranik
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

This paper examines widely prevalent yet little-studied expressions in Indian languages which are known as geometrical terms be-cause “they engage locations along the axes of the reference object”. These terms are andara (inside), b ̄ahara (outside), ̄age (in front of), s ̄amane (in front of), p ̄ıche (back), ̄upara (above/over), n ̄ıce (under/below), d ̄ayem. (right), b ̄ayem. (left), p ̄asa (near), d ̄ura (away/far) in Hindi. The way these terms have been interpreted by the scholars of the Hindi language and handled in the Hindi Dependency treebank is misleading. This paper proposes an alternative analysis of these terms focusing on their triple – nominal, modifier and relational - functions and presents abstract semantic representations of these terms following the proposed analysis. The semantic representation will be explicit, unambiguous abstract and therefore universal in nature. The correspondence of these terms in Bangla and Kannada are also identified. Disambiguation of geometric terms will facilitate parsing and machine translation especially from Indian Language to English because these geometric terms of Indian languages are variedly translated in English de-pending on context.

2015

pdf bib
A VSM-based Statistical Model for the Semantic Relation Interpretation of Noun-Modifier Pairs
Nitesh Surtani | Soma Paul
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
Automatic conversion of Indian Language Morphological Processors into Grammatical Framework (GF)
Harsha Vardhan Grandhi | Soma Paul
Proceedings of the 12th International Conference on Natural Language Processing

pdf bib
A Hybrid Approach for Bracketing Noun Sequence
Arpita Batra | Soma Paul
Proceedings of the 12th International Conference on Natural Language Processing

pdf bib
Applying Sanskrit Concepts for Reordering in MT
Akshar Bharati | Sukhada | Prajna Jha | Soma Paul | Dipti M Sharma
Proceedings of the 12th International Conference on Natural Language Processing

2014

pdf bib
A Two-Stage Approach for Computing Associative Responses to a Set of Stimulus Words
Urmi Ghosh | Sambhav Jain | Soma Paul
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)

pdf bib
Creating a PurposeNet Ontology: An insight into the issues encountered during ontology creation
Rishabh Srivastava | Soma Paul
Proceedings of the 11th International Conference on Natural Language Processing

pdf bib
Translation of TO infinitives in Anusaaraka Platform: an English Hindi MT system
Akshar Bharati | Sukhada | Soma Paul
Proceedings of the 11th International Conference on Natural Language Processing

pdf bib
A hybrid approach for automatic clause boundary identification in Hindi
Rahul Sharma | Soma Paul
Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing

2013

pdf bib
IIIT-H: A Corpus-Driven Co-occurrence Based Probabilistic Model for Noun Compound Paraphrasing
Nitesh Surtani | Arpita Batra | Urmi Ghosh | Soma Paul
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
Automatic Clause Boundary Annotation in the Hindi Treebank
Rahul Sharma | Soma Paul | Riyaz Ahmad Bhat | Sambhav Jain
Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation (PACLIC 27)

2012

pdf bib
Semantic Processing of Compounds in Indian Languages
Amba Kulkarni | Soma Paul | Malhar Kulkarni | Anil Kumar | Nitesh Surtani
Proceedings of COLING 2012

pdf bib
Automatic Annotation of Genitives in Hindi Treebank
Nitesh Surtani | Soma Paul
Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages

pdf bib
Automatic Tripartite Classification of Intransitive Verbs
Nitesh Surtani | Soma Paul
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation

pdf bib
Hybrid Approach for the Interpretation of Nominal Compounds using Ontology
Sruti Rallapalli | Soma Paul
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation

2010

pdf bib
Syntactic Construct : An Aid for translating English Nominal Compound into Hindi
Soma Paul | Prashant Mathur | Sushant Kishore
Proceedings of the NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics

2009

pdf bib
All Words Unsupervised Semantic Category Labeling for Hindi
Siva Reddy | Abhilash Inumella | Rajeev Sangal | Soma Paul
Proceedings of the International Conference RANLP-2009