2021
pdf
abs
Multilingual Sequence Labeling Approach to solve Lexical Normalization
Divesh Kubal
|
Apurva Nagvenkar
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)
The task of converting a nonstandard text to a standard and readable text is known as lexical normalization. Almost all the Natural Language Processing (NLP) applications require the text data in normalized form to build quality task-specific models. Hence, lexical normalization has been proven to improve the performance of numerous natural language processing tasks on social media. This study aims to solve the problem of Lexical Normalization by formulating the Lexical Normalization task as a Sequence Labeling problem. This paper proposes a sequence labeling approach to solve the problem of Lexical Normalization in combination with the word-alignment technique. The goal is to use a single model to normalize text in various languages namely Croatian, Danish, Dutch, English, Indonesian-English, German, Italian, Serbian, Slovenian, Spanish, Turkish, and Turkish-German. This is a shared task in “2021 The 7th Workshop on Noisy User-generated Text (W-NUT)” in which the participants are expected to create a system/model that performs lexical normalization, which is the translation of non-canonical texts into their canonical equivalents, comprising data from over 12 languages. The proposed single multilingual model achieves an overall ERR score of 43.75 on intrinsic evaluation and an overall Labeled Attachment Score (LAS) score of 63.12 on extrinsic evaluation. Further, the proposed method achieves the highest Error Reduction Rate (ERR) score of 61.33 among the participants in the shared task. This study highlights the effects of using additional training data to get better results as well as using a pre-trained Language model trained on multiple languages rather than only on one language.
2016
pdf
abs
IndoWordNet Conversion to Web Ontology Language (OWL)
Apurva Nagvenkar
|
Jyoti Pawar
|
Pushpak Bhattacharyya
Proceedings of the 8th Global WordNet Conference (GWC)
WordNet plays a significant role in Linked Open Data (LOD) cloud. It has numerous application ranging from ontology annotation to ontology mapping. IndoWordNet is a linked WordNet connecting 18 Indian language WordNets with Hindi as a source WordNet. The Hindi WordNet was initially developed by linking it to English WordNet. In this paper, we present a data representation of IndoWordNet in Web Ontology Language (OWL). The schema of Princeton WordNet has been enhanced to support the representation of IndoWordNet. This IndoWordNet representation in OWL format is now available to link other web resources. This representation is implemented for eight Indian languages.
2015
pdf
Let Sense Bags Do Talking: Cross Lingual Word Semantic Similarity for English and Hindi
Apurva Nagvenkar
|
Jyoti Pawar
|
Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing
2014
pdf
Concept Space Synset Manager Tool
Apurva Nagvenkar
|
Neha Prabhugaonkar
|
Venkatesh Prabhu
|
Ramdas Karmali
|
Jyoti Pawar
Proceedings of the Seventh Global Wordnet Conference
2012
pdf
An Efficient Database Design for IndoWordNet Development Using Hybrid Approach
Venkatesh Prabhu
|
Shilpa Desai
|
Hanumant Redkar
|
Neha Prabhugaonkar
|
Apurva Nagvenkar
|
Ramdas Karmali
Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing
pdf
IndoWordNet Application Programming Interfaces
Neha Prabhugaonkar
|
Apurva Nagvenkar
|
Ramdas Karmali
Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing
pdf
WordNet Website Development And Deployment using Content Management Approach
Neha Prabhugaonkar
|
Apurva Nagvenkar
|
Venkatesh Prabhu
|
Ramdas Karmali
Proceedings of COLING 2012: Demonstration Papers