NoEl: An Annotated Corpus for Noun Ellipsis in English

Payal Khullar, Kushal Majmundar, Manish Shrivastava


Abstract
Ellipsis resolution has been identified as an important step to improve the accuracy of mainstream Natural Language Processing (NLP) tasks such as information retrieval, event extraction, dialog systems, etc. Previous computational work on ellipsis resolution has focused on one type of ellipsis, namely Verb Phrase Ellipsis (VPE) and a few other related phenomenon. We extend the study of ellipsis by presenting the No(oun)El(lipsis) corpus - an annotated corpus for noun ellipsis and closely related phenomenon using the first hundred movies of Cornell Movie Dialogs Dataset. The annotations are carried out in a standoff annotation scheme that encodes the position of the licensor, the antecedent boundary, and Part-of-Speech (POS) tags of the licensor and antecedent modifier. Our corpus has 946 instances of exophoric and endophoric noun ellipsis, making it the biggest resource of noun ellipsis in English, to the best of our knowledge. We present a statistical study of our corpus with novel insights on the distribution of noun ellipsis, its licensors and antecedents. Finally, we perform the tasks of detection and resolution of noun ellipsis with different classifiers trained on our corpus and report baseline results.
Anthology ID:
2020.lrec-1.5
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
34–43
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.5
DOI:
Bibkey:
Cite (ACL):
Payal Khullar, Kushal Majmundar, and Manish Shrivastava. 2020. NoEl: An Annotated Corpus for Noun Ellipsis in English. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 34–43, Marseille, France. European Language Resources Association.
Cite (Informal):
NoEl: An Annotated Corpus for Noun Ellipsis in English (Khullar et al., LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2020.lrec-1.5.pdf