Sobha Lalitha Devi

Also published as: Lalitha Devi Sobha, Sobha Lalitha Devi


2022

pdf
Classification of Multiword Expressions in Malayalam
Treesa Cyriac | Sobha Lalitha Devi
Proceedings of the WILDRE-6 Workshop within the 13th Language Resources and Evaluation Conference

Multiword expression is an interesting concept in languages and the MWEs of a language are not easy for a non-native speaker to understand. It includes lexicalized phrases, idioms, collocations etc. Data on multiwords are helpful in language processing. ‘Multiword expressions in Malayalam’ is a less studied area. In this paper, we are trying to explore multiwords in Malayalam and to classify them as per the three idiosyncrasies: semantic idiosyncrasy, syntactic idiosyncrasy, and statistic idiosyncrasy. Though these are already identified, they are not being studied in Malayalam. The classification and features are given and are studied using Malayalam multiwords. Through this study, we identified how the linguistic features of Malayalam such as agglutination influence its multiword expressions in terms of pronunciation and spelling. Malayalam has a set of code-mixed multiword expressions which is also addressed in this study.

pdf
Automatic Identification of Explicit Connectives in Malayalam
Kumari Sheeja S | Sobha Lalitha Devi
Proceedings of the WILDRE-6 Workshop within the 13th Language Resources and Evaluation Conference

This work presents an automatic identification of explicit connectives and its arguments using supervised method, Conditional Random Fields (CRFs). In this work, we focus on the identification of connectives and their arguments in the corpus. We consider explicit connectives and its arguments for the present study. The corpus we have considered has 4,000 sentences from Malayalam documents and manually annotated the corpus for POS, chunk, clause, discourse connectives and its arguments. The corpus thus annotated is used for building the base engine. The percentage of the performance of the system is evaluated based on the precision, recall and F-score and obtained encouraging results. We have analysed the errors generated by the system and used the features obtained from the anlaysis to improve the performance of the system

2021

pdf
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Sivaji Bandyopadhyay | Sobha Lalitha Devi | Pushpak Bhattacharyya
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

pdf
Dependency Parsing in a Morphological rich language, Tamil
Vijay Sundar Ram | Sobha Lalitha Devi
Proceedings of the First Workshop on Parsing and its Applications for Indian Languages

Dependency parsing is the process of analysing the grammatical structure of a sentence based on the dependencies between the words in a sentence. The annotation of dependency parsing is done using different formalisms at word-level namely Universal Dependencies and chunk-level namely AnnaCorra. Though dependency parsing is deeply dealt in languages such as English, Czech etc the same cannot be adopted for the morphologically rich and agglutinative languages. In this paper, we discuss the development of a dependency parser for Tamil, a South Dravidian language. The different characteristics of the language make this task a challenging task. Tamil, a morphologically rich and agglutinative language, has copula drop, accusative and genitive case drop and pro-drop. Coordinative constructions are introduced by affixation of morpheme ‘um’. Embedded clausal structures are common in relative participle and complementizer clauses. In this paper, we have discussed our approach to handle some of these challenges. We have used Malt parser, a supervised learning- approach based implementation. We have obtained an accuracy of 79.27% for Unlabelled Attachment Score, 73.64% for Labelled Attachment Score and 68.82% for Labelled Accuracy.

2020

pdf
Handling Noun-Noun Coreference in Tamil
Vijay Sundar Ram | Sobha Lalitha Devi
Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation

Natural language understanding by automatic tools is the vital requirement for document processing tools. To achieve it, automatic system has to understand the coherence in the text. Co-reference chains bring coherence to the text. The commonly occurring reference markers which bring cohesiveness are Pronominal, Reflexives, Reciprocals, Distributives, One-anaphors, Noun–noun reference. Here in this paper, we deal with noun-noun reference in Tamil. We present the methodology to resolve these noun-noun anaphors and also present the challenges in handling the noun-noun anaphoric relations in Tamil.

pdf
A Deeper Study on Features for Named Entity Recognition
Malarkodi C S | Sobha Lalitha Devi
Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation

This paper deals with the various features used for the identification of named entities. The performance of the machine learning system heavily depends on the feature selection criteria. The intention to trace the essential features required for the development of named entity system across languages motivated us to conduct this study. The linguistic analysis was done to find out the part of speech patterns surrounding the context of named entities and from the observation linguistic oriented features are identified for both Indian and European languages. The Indian languages belongs to Dravidian language family such as Tamil, Telugu, Malayalam, Indo-Aryan language family such as Hindi, Punjabi, Bengali and Marathi, European languages such as English, Spanish, Dutch, German and Hungarian are used in this work. The machine learning technique CRFs was used for the system development. The experiments were conducted using the linguistic features and the results obtained for each languages are comparable with state-of-art systems.

2019

pdf
Resolving Pronouns for a Resource-Poor Language, Malayalam Using Resource-Rich Language, Tamil.
Sobha Lalitha Devi
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

In this paper we give in detail how a resource rich language can be used for resolving pronouns for a less resource language. The source language, which is resource rich language in this study, is Tamil and the resource poor language is Malayalam, both belonging to the same language family, Dravidian. The Pronominal resolution developed for Tamil uses CRFs. Our approach is to leverage the Tamil language model to test Malayalam data and the processing required for Malayalam data is detailed. The similarity at the syntactic level between the languages is exploited in identifying the features for developing the Tamil language model. The word form or the lexical item is not considered as a feature for training the CRFs. Evaluation on Malayalam Wikipedia data shows that our approach is correct and the results, though not as good as Tamil, but comparable.

2017

pdf
Scalable Bio-Molecular Event Extraction System towards Knowledge Acquisition
Pattabhi RK Rao | Sindhuja Gopalan | Sobha Lalitha Devi
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

pdf
Co-reference Resolution in Tamil Text
Vijay Sundar Ram | Sobha Lalitha Devi
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

pdf
Cross Linguistic Variations in Discourse Relations among Indian Languages
Sindhuja Gopalan | Lakshmi S | Sobha Lalitha Devi
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

2016

pdf
How to Handle Split Antecedents in Tamil?
Vijay Sundar Ram | Sobha Lalitha Devi
Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016)

pdf
BioDCA Identifier: A System for Automatic Identification of Discourse Connective and Arguments from Biomedical Text
Sindhuja Gopalan | Sobha Lalitha Devi
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)

This paper describes a Natural language processing system developed for automatic identification of explicit connectives, its sense and arguments. Prior work has shown that the difference in usage of connectives across corpora affects the cross domain connective identification task negatively. Hence the development of domain specific discourse parser has become indispensable. Here, we present a corpus annotated with discourse relations on Medline abstracts. Kappa score is calculated to check the annotation quality of our corpus. The previous works on discourse analysis in bio-medical data have concentrated only on the identification of connectives and hence we have developed an end-end parser for connective and argument identification using Conditional Random Fields algorithm. The type and sub-type of the connective sense is also identified. The results obtained are encouraging.

2015

pdf
A Hybrid Discourse Relation Parser in CoNLL 2015
Sobha Lalitha Devi | Sindhuja Gopalan | Lakshmi S. | Pattabhi RK Rao | Vijay Sundar Ram | Malarkodi C.S.
Proceedings of the Nineteenth Conference on Computational Natural Language Learning - Shared Task

2014

pdf
A Generic Anaphora Resolution Engine for Indian Languages
Sobha Lalitha Devi | Vijay Sundar Ram | Pattabhi RK Rao
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf
Automatic Conversion of Dialectal Tamil Text to Standard Written Tamil Text using FSTs
Marimuthu K | Sobha Lalitha Devi
Proceedings of the 2014 Joint Meeting of SIGMORPHON and SIGFSM

2013

pdf
Malayalam Clause Boundary Identifier: Annotation and Evaluation
Sobha Lalitha Devi | Lakshmi S
Proceedings of the 4th Workshop on South and Southeast Asian Natural Language Processing

2012

pdf
How Human Analyse Lexical Indicators of Sentiments- A Cognitive Analysis Using Reaction-Time
Marimuthu K | Sobha Lalitha Devi
Proceedings of the 2nd Workshop on Sentiment Analysis where AI meets Psychology

pdf
Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages
Dipti Misra Sharma | Prashanth Mannem | Joseph vanGenabith | Sobha Lalitha Devi | Radhika Mamidi | Ranjani Parthasarathi
Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages

pdf
Tamil NER - Coping with Real Time Challenges
Malarkodi C.S | Pattabhi RK Rao | Sobha Lalitha Devi
Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages

pdf
Clause Boundary Identification for Malayalam Using CRF
Lakshmi S. | Vijay Sundar Ram R | Sobha Lalitha Devi
Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages

pdf
Resolution for Pronouns in Tamil Using CRF
Akilandeswari A | Sobha Lalitha Devi
Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages

2011

pdf
Hybrid Approach for Coreference Resolution
Lalitha Devi Sobha | Pattabhi RK Rao | R. Vijay Sundar Ram | CS. Malarkodi | A. Akilandeswari
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

2010

pdf
An alternate approach towards meaningful lyric generation in Tamil
Ananth Ramakrishnan A | Sobha Lalitha Devi
Proceedings of the NAACL HLT 2010 Second Workshop on Computational Approaches to Linguistic Creativity

pdf
How to Get the Same News from Different Language News Papers
T. Pattabhi R. K Rao | Sobha Lalitha Devi
Proceedings of the 4th Workshop on Cross Lingual Information Access

2009

pdf
Automatic Generation of Tamil Lyrics for Melodies
Ananth Ramakrishnan A | Sankar Kuppan | Sobha Lalitha Devi
Proceedings of the Workshop on Computational Approaches to Linguistic Creativity