Payal Khullar


Why Find the Right One?
Payal Khullar
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

The present paper investigates the impact of the anaphoric one words in English on the Neural Machine Translation (NMT) process using English-Hindi as source and target language pair. As expected, the experimental results show that the state-of-the-art Google English-Hindi NMT system achieves significantly poorly on sentences containing anaphoric ones as compared to the sentences containing regular, non-anaphoric ones. But, more importantly, we note that amongst the anaphoric words, the noun class is clearly much harder for NMT than the determinatives. This reaffirms the linguistic disparity of the two phenomenon in recent theoretical syntactic literature, despite the obvious surface similarities.

Are Ellipses Important for Machine Translation?
Payal Khullar
Computational Linguistics, Volume 47, Issue 4 - December 2021

Abstract This article describes an experiment to evaluate the impact of different types of ellipses discussed in theoretical linguistics on Neural Machine Translation (NMT), using English to Hindi/Telugu as source and target languages. Evaluation with manual methods shows that most of the errors made by Google NMT are located in the clause containing the ellipsis, the frequency of such errors is slightly more in Telugu than Hindi, and the translation adequacy shows improvement when ellipses are reconstructed with their antecedents. These findings not only confirm the importance of ellipses and their resolution for MT, but also hint toward a possible correlation between the translation of discourse devices like ellipses with the morphological incongruity of the source and target. We also observe that not all ellipses are translated poorly and benefit from reconstruction, advocating for a disparate treatment of different ellipses in MT research.


NoEl: An Annotated Corpus for Noun Ellipsis in English
Payal Khullar | Kushal Majmundar | Manish Shrivastava
Proceedings of the Twelfth Language Resources and Evaluation Conference

Ellipsis resolution has been identified as an important step to improve the accuracy of mainstream Natural Language Processing (NLP) tasks such as information retrieval, event extraction, dialog systems, etc. Previous computational work on ellipsis resolution has focused on one type of ellipsis, namely Verb Phrase Ellipsis (VPE) and a few other related phenomenon. We extend the study of ellipsis by presenting the No(oun)El(lipsis) corpus - an annotated corpus for noun ellipsis and closely related phenomenon using the first hundred movies of Cornell Movie Dialogs Dataset. The annotations are carried out in a standoff annotation scheme that encodes the position of the licensor, the antecedent boundary, and Part-of-Speech (POS) tags of the licensor and antecedent modifier. Our corpus has 946 instances of exophoric and endophoric noun ellipsis, making it the biggest resource of noun ellipsis in English, to the best of our knowledge. We present a statistical study of our corpus with novel insights on the distribution of noun ellipsis, its licensors and antecedents. Finally, we perform the tasks of detection and resolution of noun ellipsis with different classifiers trained on our corpus and report baseline results.

Exploring Statistical and Neural Models for Noun Ellipsis Detection and Resolution in English
Payal Khullar
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop

Computational approaches to noun ellipsis resolution has been sparse, with only a naive rule-based approach that uses syntactic feature constraints for marking noun ellipsis licensors and selecting their antecedents. In this paper, we further the ellipsis research by exploring several statistical and neural models for both the subtasks involved in the ellipsis resolution process and addressing the representation and contribution of manual features proposed in previous research. Using the best performing models, we build an end-to-end supervised Machine Learning (ML) framework for this task that improves the existing F1 score by 16.55% for the detection and 14.97% for the resolution subtask. Our experiments demonstrate robust scores through pretrained BERT (Bidirectional Encoder Representations from Transformers) embeddings for word representation, and more so the importance of manual features– once again highlighting the syntactic and semantic characteristics of the ellipsis phenomenon. For the classification decision, we notice that a simple Multilayar Perceptron (MLP) works well for the detection of ellipsis; however, Recurrent Neural Networks (RNN) are a better choice for the much harder resolution step.

Finding The Right One and Resolving it
Payal Khullar | Arghya Bhattacharya | Manish Shrivastava
Proceedings of the 24th Conference on Computational Natural Language Learning

One-anaphora has figured prominently in theoretical linguistic literature, but computational linguistics research on the phenomenon is sparse. Not only that, the long standing linguistic controversy between the determinative and the nominal anaphoric element one has propagated in the limited body of computational work on one-anaphora resolution, making this task harder than it is. In the present paper, we resolve this by drawing from an adequate linguistic analysis of the word one in different syntactic environments - once again highlighting the significance of linguistic theory in Natural Language Processing (NLP) tasks. We prepare an annotated corpus marking actual instances of one-anaphora with their textual antecedents, and use the annotations to experiment with state-of-the art neural models for one-anaphora resolution. Apart from presenting a strong neural baseline for this task, we contribute a gold-standard corpus, which is, to the best of our knowledge, the biggest resource on one-anaphora till date.


Using Syntax to Resolve NPE in English
Payal Khullar | Allen Antony | Manish Shrivastava
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

This paper describes a novel, syntax-based system for automatic detection and resolution of Noun Phrase Ellipsis (NPE) in English. The system takes in free input English text, detects the site of nominal elision, and if present, selects potential antecedent candidates. The rules are built using the syntactic information on ellipsis and its antecedent discussed in previous theoretical linguistics literature on NPE. Additionally, we prepare a curated dataset of 337 sentences from well-known, reliable sources, containing positive and negative samples of NPE. We split this dataset into two parts, and use one part to refine our rules and the other to test the performance of our final system. We get an F1-score of 76.47% for detection and 70.27% for NPE resolution on the testset. To the best of our knowledge, ours is the first system that detects and resolves NPE in English. The curated dataset used for this task, albeit small, covers a wide variety of NPE cases and will be made public for future work.


Automatic Question Generation using Relative Pronouns and Adverbs
Payal Khullar | Konigari Rachna | Mukul Hase | Manish Shrivastava
Proceedings of ACL 2018, Student Research Workshop

This paper presents a system that automatically generates multiple, natural language questions using relative pronouns and relative adverbs from complex English sentences. Our system is syntax-based, runs on dependency parse information of a single-sentence input, and achieves high accuracy in terms of syntactic correctness, semantic adequacy, fluency and uniqueness. One of the key advantages of our system, in comparison with other rule-based approaches, is that we nearly eliminate the chances of getting a wrong wh-word in the generated question, by fetching the requisite wh-word from the input sentence itself. Depending upon the input, we generate both factoid and descriptive type questions. To the best of our information, the exploitation of wh-pronouns and wh-adverbs to generate questions is novel in the Automatic Question Generation task.