Mridul Gupta

2011

Extending a probabilistic phrase alignment approach for SMT
Mridul Gupta | Sanjika Hewavitharana | Stephan Vogel
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

Phrase alignment is a crucial step in phrase-based statistical machine translation. We explore a way of improving phrase alignment by adding syntactic information in the form of chunks as soft constraints guided by an in-depth and detailed analysis on a hand-aligned data set. We extend a probabilistic phrase alignment model that extracts phrase pairs by optimizing phrase pair boundaries over the sentence pair [1]. The boundaries of the target phrase are chosen such that the overall sentence alignment probability is optimal. Viterbi alignment information is also added in the extended model with a view of improving phrase alignment. We extract phrase pairs using a relatively larger number of features which are discriminatively trained using a large-margin online learning algorithm, i.e., Margin Infused Relaxed Algorithm (MIRA) and integrate it in our approach. Initial experiments show improvements in both phrase alignment and translation quality for Arabic-English on a moderate-size translation task.

pdf bib

Error Detection for Treebank Validation
Bharat Ram Ambati | Rahul Agarwal | Mridul Gupta | Samar Husain | Dipti Misra Sharma
Proceedings of the 9th Workshop on Asian Language Resources

2010

pdf bib abs

A High Recall Error Identification Tool for Hindi Treebank Validation
Bharat Ram Ambati | Mridul Gupta | Samar Husain | Dipti Misra Sharma
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes the development of a hybrid tool for a semi-automated process for validation of treebank annotation at various levels. The tool is developed for error detection at the part-of-speech, chunk and dependency levels of a Hindi treebank, currently under development. The tool aims to identify as many errors as possible at these levels to achieve consistency in the task of annotation. Consistency in treebank annotation is a must for making data as error-free as possible and for providing quality assurance. The tool is aimed at ensuring consistency and to make manual validation cost effective. We discuss a rule based and a hybrid approach (statistical methods combined with rule-based methods) by which a high-recall system can be developed and used to identify errors in the treebank. We report some results of using the tool on a sample of data extracted from the Hindi treebank. We also argue how the tool can prove useful in improving the annotation guidelines which would in turn, better the quality of annotation in subsequent iterations.

pdf bib abs

Partial Parsing as a Method to Expedite Dependency Annotation of a Hindi Treebank
Mridul Gupta | Vineet Yadav | Samar Husain | Dipti Misra Sharma
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The paper describes an approach to expedite the process of manual annotation of a Hindi dependency treebank which is currently under development. We propose a way by which consistency among a set of manual annotators could be improved. Furthermore, we show that our setup can also prove useful for evaluating when an inexperienced annotator is ready to start participating in the production of the treebank. We test our approach on sample sets of data obtained from an ongoing work on creation of this treebank. The results asserting our proposal are reported in this paper. We report results from a semi-automated approach of dependency annotation experiment. We find out the rate of agreement between annotators using Cohens Kappa. We also compare results with respect to the total time taken to annotate sample data-sets using a completely manual approach as opposed to a semi-automated approach. It is observed from the results that this semi-automated approach when carried out with experienced and trained human annotators improves the overall quality of treebank annotation and also speeds up the process.

Mridul Gupta

2011

2010

2009

Co-authors

Venues