Anaphoric Zero Pronoun Identification: A Multilingual Approach

Abdulrahman Aloraini, Massimo Poesio


Abstract
Pro-drop languages such as Arabic, Chinese, Italian or Japanese allow morphologically null but referential arguments in certain syntactic positions, called anaphoric zero-pronouns. Much NLP work on anaphoric zero-pronouns (AZP) is based on gold mentions, but models for their identification are a fundamental prerequisite for their resolution in real-life applications. Such identification requires complex language understanding and knowledge of real-world entities. Transfer learning models, such as BERT, have recently shown to learn surface, syntactic, and semantic information,which can be very useful in recognizing AZPs. We propose a BERT-based multilingual model for AZP identification from predicted zero pronoun positions, and evaluate it on the Arabic and Chinese portions of OntoNotes 5.0. As far as we know, this is the first neural network model of AZP identification for Arabic; and our approach outperforms the stateof-the-art for Chinese. Experiment results suggest that BERT implicitly encode information about AZPs through their surrounding context.
Anthology ID:
2020.crac-1.3
Volume:
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference
Month:
December
Year:
2020
Address:
Barcelona, Spain (online)
Venue:
CRAC
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22–32
Language:
URL:
https://aclanthology.org/2020.crac-1.3
DOI:
Bibkey:
Cite (ACL):
Abdulrahman Aloraini and Massimo Poesio. 2020. Anaphoric Zero Pronoun Identification: A Multilingual Approach. In Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference, pages 22–32, Barcelona, Spain (online). Association for Computational Linguistics.
Cite (Informal):
Anaphoric Zero Pronoun Identification: A Multilingual Approach (Aloraini & Poesio, CRAC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2020.crac-1.3.pdf