Karthik Visweswariah

2014

pdf abs
When Transliteration Met Crowdsourcing : An Empirical Study of Transliteration via Crowdsourcing using Efficient, Non-redundant and Fair Quality Control
Mitesh M. Khapra | Ananthakrishnan Ramanathan | Anoop Kunchukuttan | Karthik Visweswariah | Pushpak Bhattacharyya
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Sufficient parallel transliteration pairs are needed for training state of the art transliteration engines. Given the cost involved, it is often infeasible to collect such data using experts. Crowdsourcing could be a cheaper alternative, provided that a good quality control (QC) mechanism can be devised for this task. Most QC mechanisms employed in crowdsourcing are aggressive (unfair to workers) and expensive (unfair to requesters). In contrast, we propose a low-cost QC mechanism which is fair to both workers and requesters. At the heart of our approach, lies a rule based Transliteration Equivalence approach which takes as input a list of vowels in the two languages and a mapping of the consonants in the two languages. We empirically show that our approach outperforms other popular QC mechanisms (viz., consensus and sampling) on two vital parameters : (i) fairness to requesters (lower cost per correct transliteration) and (ii) fairness to workers (lower rate of rejecting correct answers). Further, as an extrinsic evaluation we use the standard NEWS 2010 test set and show that such quality controlled crowdsourced data compares well to expert data when used for training a transliteration engine.

pdf
Unsupervised Solution Post Identification from Discussion Forums
Deepak P | Karthik Visweswariah
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf
Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation
Karthik Visweswariah | Mitesh M. Khapra | Ananthakrishnan Ramanathan
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Improving reordering performance using higher order and structural features
Mitesh M. Khapra | Ananthakrishnan Ramanathan | Karthik Visweswariah
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Semi-Supervised Answer Extraction from Discussion Forums
Rose Catherine | Rashmi Gangadharaiah | Karthik Visweswariah | Dinesh Raghu
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
Proceedings of the Workshop on Reordering for Statistical Machine Translation
Karthik Visweswariah | Ananthakrishnan Ramanathan | Mitesh M. Khapra
Proceedings of the Workshop on Reordering for Statistical Machine Translation

pdf bib
Whitepaper for Shared Task on Learning Reordering from Word Alignments at RSMT 2012
Mitesh M. Khapra | Ananthakrishnan Ramanathan | Karthik Visweswariah
Proceedings of the Workshop on Reordering for Statistical Machine Translation

pdf bib
Report of the Shared Task on Learning Reordering from Word Alignments at RSMT 2012
Mitesh M. Khapra | Ananthakrishnan Ramanathan | Karthik Visweswariah
Proceedings of the Workshop on Reordering for Statistical Machine Translation

pdf
A Comparison of Syntactic Reordering Methods for English-German Machine Translation
Jiří Navrátil | Karthik Visweswariah | Ananthakrishnan Ramanathan
Proceedings of COLING 2012

pdf abs
A Study of Word-Classing for MT Reordering
Ananthakrishnan Ramanathan | Karthik Visweswariah
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

MT systems typically use parsers to help reorder constituents. However most languages do not have adequate treebank data to learn good parsers, and such training data is extremely time-consuming to annotate. Our earlier work has shown that a reordering model learned from word-alignments using POS tags as features can improve MT performance (Visweswariah et al., 2011). In this paper, we investigate the effect of word-classing on reordering performance using this model. We show that unsupervised word clusters perform somewhat worse but still reasonably well, compared to a part-of-speech (POS) tagger built with a small amount of annotated data; while a richer tag set including case and gender-number-person further improves reordering performance by around 1.2 monolingual BLEU points. While annotating this richer tagset is more complicated than annotating the base tagset, it is much easier than annotating treebank data.