Devadath V V
Also published as: Devadath V V
2016
Significance of an Accurate Sandhi-Splitter in Shallow Parsing of Dravidian Languages
Devadath V V
|
Dipti Misra Sharma
Proceedings of the ACL 2016 Student Research Workshop
Align Me: A framework to generate Parallel Corpus Using OCRs and Bilingual Dictionaries
Priyam Bakliwal
|
Devadath V V
|
C V Jawahar
Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016)
Multilingual language processing tasks like statistical machine translation and cross language information retrieval rely mainly on availability of accurate parallel corpora. Manual construction of such corpus can be extremely expensive and time consuming. In this paper we present a simple yet efficient method to generate huge amount of reasonably accurate parallel corpus with minimal user efforts. We utilize the availability of large number of English books and their corresponding translations in other languages to build parallel corpus. Optical Character Recognizing systems are used to digitize such books. We propose a robust dictionary based parallel corpus generation system for alignment of multilingual text at different levels of granularity (sentence, paragraphs, etc). We show the performance of our proposed method on a manually aligned dataset of 300 Hindi-English sentences and 100 English-Malayalam sentences.
2014
A Sandhi Splitter for Malayalam
Devadath V V
|
Litton J Kurisinkel
|
Dipti Misra Sharma
|
Vasudeva Varma
Proceedings of the 11th International Conference on Natural Language Processing
Search