2020
pdf
bib
Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations
Vishal Goyal
|
Asif Ekbal
Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations
pdf
bib
abs
Development of Hybrid Algorithm for Automatic Extraction of Multiword Expressions from Monolingual and Parallel Corpus of English and Punjabi
Kapil Dev Goyal
|
Vishal Goyal
Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations
Identification and extraction of Multiword Expressions (MWEs) is very hard and challenging task in various Natural Language processing applications like Information Retrieval (IR), Information Extraction (IE), Question-Answering systems, Speech Recognition and Synthesis, Text Summarization and Machine Translation (MT). Multiword Expressions are two or more consecutive words but treated as a single word and actual meaning this expression cannot be extracted from meaning of individual word. If any systems recognized this expression as separate words, then results of system will be incorrect. Therefore it is mandatory to identify these expressions to improve the result of the system. In this report, our main focus is to develop an automated tool to extract Multiword Expressions from monolingual and parallel corpus of English and Punjabi. In this tool, Rule based approach, Linguistic approach, statistical approach, and many more approaches were used to identify and extract MWEs from monolingual and parallel corpus of English and Punjabi and achieved more than 90% f-score value in some types of MWEs.
pdf
abs
Punjabi to English Bidirectional NMT System
Kamal Deep
|
Ajit Kumar
|
Vishal Goyal
Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations
Machine Translation is ongoing research for last few decades. Today, Corpus-based Machine Translation systems are very popular. Statistical Machine Translation and Neural Machine Translation are based on the parallel corpus. In this research, the Punjabi to English Bidirectional Neural Machine Translation system is developed. To improve the accuracy of the Neural Machine Translation system, Word Embedding and Byte Pair Encoding is used. The claimed BLEU score is 38.30 for Punjabi to English Neural Machine Translation system and 36.96 for English to Punjabi Neural Machine Translation system.
pdf
abs
EXTRACTING PARALLEL PHRASES FROM COMPARABLE ENGLISH AND PUNJABI CORPORA USING AN INTEGRATED APPROACH
Manpreet Singh Lehal
|
Vishal Goyal
Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations
Machine translation from English to Indian languages is always a difficult task due to the unavailability of a good quality corpus and morphological richness in the Indian languages. For a system to produce better translations, the size of the corpus should be huge. We have employed three similarity and distance measures for the research and developed a software to extract parallel data from comparable corpora automatically with high precision using minimal resources. The software works upon four algorithms. The three algorithms have been used for finding Cosine Similarity, Euclidean Distance Similarity and Jaccard Similarity. The fourth algorithm is to integrate the outputs of the three algorithms in order to improve the efficiency of the system.
pdf
abs
Urdu To Punjabi Machine Translation System
Umrinder Pal Singh
|
Vishal Goyal
|
Gurpreet Lehal
Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations
Machine Translation is a popular area of NLP research field. There are various approaches to develop a machine translation system like Rule-Based, Statistical, Neural and Hybrid. A rule-Based system is based on grammatical rules and uses bilingual lexicons. Statistical and Neural use the large parallel corpus for training the respective models. Where the Hybrid MT system is a mixture of different approaches. In these days the corpus-based machine translation system is quite popular in NLP research area. But these models demands huge parallel corpus. In this research, we have used a hybrid approach to develop Urdu to Punjabi machine translation system. In the developed system, statistical and various sub-system based on the linguistic rule has been used. The system yield 80% accuracy on a different set of the sentence related to domains like Political, Entertainment, Tourism, Sports and Health. The complete system has been developed in a C#.NET programming language.
pdf
abs
Sentiment Analysis of English-Punjabi Code-Mixed Social Media Content
Mukhtiar Singh
|
Vishal Goyal
Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations
Sentiment analysis is a field of study for analyzing people’s emotions, such as Nice, Happy, ਦੁਖੀ (sad), changa (Good), etc. towards the entities and attributes expressed in written text. It noticed that, on microblogging websites (Facebook, YouTube, Twitter ), most people used more than one language to express their emotions. The change of one language to another language within the same written text is called code-mixing. In this research, we gathered the English-Punjabi code-mixed corpus from micro-blogging websites. We have performed language identification of code-mix text, which includes Phonetic Typing, Abbreviation, Wordplay, Intentionally misspelled words and Slang words. Then we performed tokenization of English and Punjabi language words consisting of different spellings. Then we performed sentiment analysis based on the above text based on the lexicon approach. The dictionary created for English Punjabi code mixed consists of opinionated words. The opinionated words are then categorized into three categories i.e. positive words list, negative words list, and neutral words list. The rest of the words are being stored in an unsorted word list. By using the N-gram approach, a statistical technique is applied at sentence level sentiment polarity of the English-Punjabi code-mixed dataset. Our results show an accuracy of 83% with an F-1 measure of 77%.
pdf
abs
Airport Announcement System for Deaf
Rakesh Kumar
|
Vishal Goyal
|
Lalit Goyal
Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations
People belonging to hearing-impaired community feels very uncomfortable while travelling or visiting at airport without the help of human interpreter. Hearing-impaired people are not able to hear any announcements made at airport like which flight heading to which destination. They remain ignorant about the choosing of gate number or counter number without the help of interpreter. Even they cannot find whether flight is on time, delayed or cancelled. The Airport Announcement System for Deaf is a rule-based MT developed. It is the first system developed in the domain of public places to translate all the announcements used at Airport into Indian Sign Language (ISL) synthetic animations. The system is developed using Python and Flask Framework. This Machine Translation system accepts announcements in the form of English text as input and produces Indian Sign Language (ISL) synthetic animations as output.
pdf
abs
Railway Stations Announcement System for Deaf
Rakesh Kumar
|
Vishal Goyal
|
Lalit Goyal
Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations
People belonging to hearing-impaired community feels very uncomfortable while travelling or visiting at Railway Stations without the help of human interpreter. Hearing-impaired people are not able to hear any announcements made at Railway Stations like which train heading to which destination. They remain ignorant about the choosing of platform number or counter number without the help of interpreter. Even they cannot find whether train is on time, delayed or cancelled. The Railway Stations Announcement System for Deaf is a rule-based MT developed. It is the first system developed in the domain of public places to translate all the announcements used at Railway Stations into Indian Sign Language (ISL) synthetic animations. The system is developed using Python and Flask Framework. This Machine Translation system accepts announcements in the form of English text as input and produces Indian Sign Language (ISL) synthetic animations as output.
pdf
abs
Automatic Translation of Complex English Sentences to Indian Sign Language Synthetic Video Animations
Deepali Goyal
|
Vishal Goyal
|
Lalit Goyal
Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations
Sign Language is the natural way of expressing thoughts and feelings for the deaf community. Sign language is a diagrammatic and non-verbal language used by the impaired community to communicate their feeling to their lookalike one. Today we live in the era of technological development, owing to which instant communication is quite easy but even then, a lot of work needs to be done in the field of Sign language automation to improve the quality of life among the deaf community. The traditional approaches used for representing the signs are in the form of videos or text that are expensive, time-consuming, and are not easy to use. In this research work, an attempt is made for the conversion of Complex and Compound English sentences to Indian Sign Language (ISL) using synthetic video animations. The translation architecture includes a parsing module that parses the input complex or compound English sentences to their simplified versions by using complex to simple and compound to simple English grammar rules respectively. The simplified sentence is then forwarded to the conversion segment that rearranges the words of the English language into its corresponding ISL using the devised grammar rules. The next segment constitutes the removal of unwanted words or stop words. This segment gets an input sentence generated by ISL grammar rules. Unwanted or unnecessary words are eliminated by this segment. This removal is important because ISL needs only a meaningful sentence rather than unnecessary usage of linking verbs, helping verbs, and so on. After parsing through the eliminator segment, the sentence is sent to the concordance segment. This segment checks each word in the sentence and translates them into their respective lemma. Lemma is the basic requiring node of each word because sign language makes use of basic words irrespective of other languages that make use of gerund, suffixes, three forms of verbs, different kinds of nouns, adjectives, pronouns in their sentence theory. All the words of the sentence are checked in the lexicon which contains the English word with its HamNoSys notation and the words that are not in the lexicon are replaced by their synonym. The words of the sentence are replaced by their counter HamNoSys code. In case the word is not present in the lexicon, the HamNoSys code will be taken for each alphabet of the word in sequence. The HamNoSys code is converted into the SiGML tags (a form of XML tags) and these SiGML tags are then sent to the animation module which converts the SiGML code into the synthetic animation using avatar (computer-generated animation character).
pdf
abs
Plagiarism Detection Tool for Indian Languages with Special focus on Hindi and Punjabi
Vishal Goyal
|
Rajeev Puri
|
Jitesh Pubreja
|
Jaswinder Singh
Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations
Plagiarism is closely linked with Intellectual Property Rights and Copyrights laws, both of which have been formed to protect the ownership of the concept. Most of the available tools for detecting plagiarism when tested with sample Punjabi text, failed to recognise the Punjabi text and the ones, which supported Punjabi text, did a simple string comparison for detecting the suspected copy-paste plagiarism, ignoring the other forms of plagiarism such as word switching, synonym replacement and sentence switching etc.
2017
pdf
Tutorial for Deaf – Teaching Punjabi Alphabet using Synthetic Animations
Lalit Goyal
|
Vishal Goyal
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)
2016
pdf
Automatic Translation of English Text to Indian Sign Language Synthetic Animations
Lalit Goyal
|
Vishal Goyal
Proceedings of the 13th International Conference on Natural Language Processing
2012
pdf
Named Entity Recognition System for Urdu
UmrinderPal Singh
|
Vishal Goyal
|
Gurpreet Singh Lehal
Proceedings of COLING 2012
pdf
Rule Based Hindi Part of Speech Tagger
Navneet Garg
|
Vishal Goyal
|
Suman Preet
Proceedings of COLING 2012: Demonstration Papers
pdf
Rule Based Urdu Stemmer
Rohit Kansal
|
Vishal Goyal
|
Gurpreet Singh Lehal
Proceedings of COLING 2012: Demonstration Papers
2011
pdf
bib
Hindi to Punjabi Machine Translation System
Vishal Goyal
|
Gurpreet Singh Lehal
Proceedings of the ACL-HLT 2011 System Demonstrations