Praveen Gatla


2023

pdf
Bhojpuri WordNet: Problems in Translating Hindi Synsets into Bhojpuri
Imran Ali | Praveen Gatla
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Today, artificial intelligence systems are incredibly intelligent, however they lack the human like capacity for understanding. In this context, sense-based lexical resources become a requirement for artificially intelligent machines. Lexical resources like Wordnets have received scholarly attention because they are considered as the crucial sense-based resources in the field of natural language understanding. They can help in knowing the intended meaning of the communicated texts, as they are focused on the concept rather than the words. Wordnets are available only for 18 Indian languages. Keeping this in mind, we have initiated the development of a comprehensive wordnet for Bhojpuri. The present paper describes the creation of the synsets of Bhojpuri and discusses the problems that we faced while translating Hindi synsets into Bhojpuri. They are lexical anomalies, lexical mismatch words, synthesized forms, lack of technical words etc. Nearly 4000 Hindi synsets were mapped for their equivalent synsets in Bhojpuri following the expansion approach. We have also worked on the language-specific synsets, which are unique to Bhojpuri. This resource is useful in machine translation, sentiment analysis, word sense disambiguation, cross-lingual references among Indian languages, and Bhojpuri language teaching and learning.
Search
Co-authors
Venues