Manpreet Singh Lehal


2020

pdf
EXTRACTING PARALLEL PHRASES FROM COMPARABLE ENGLISH AND PUNJABI CORPORA USING AN INTEGRATED APPROACH
Manpreet Singh Lehal | Vishal Goyal
Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations

Machine translation from English to Indian languages is always a difficult task due to the unavailability of a good quality corpus and morphological richness in the Indian languages. For a system to produce better translations, the size of the corpus should be huge. We have employed three similarity and distance measures for the research and developed a software to extract parallel data from comparable corpora automatically with high precision using minimal resources. The software works upon four algorithms. The three algorithms have been used for finding Cosine Similarity, Euclidean Distance Similarity and Jaccard Similarity. The fourth algorithm is to integrate the outputs of the three algorithms in order to improve the efficiency of the system.
Search
Co-authors
Venues