Multilingual Constituency Parsing with Self-Attention and Pre-Training

Nikita Kitaev; Steven Cao; Dan Klein

doi:10.18653/v1/P19-1340

Multilingual Constituency Parsing with Self-Attention and Pre-Training

Abstract

We show that constituency parsing benefits from unsupervised pre-training across a variety of languages and a range of pre-training conditions. We first compare the benefits of no pre-training, fastText, ELMo, and BERT for English and find that BERT outperforms ELMo, in large part due to increased model capacity, whereas ELMo in turn outperforms the non-contextual fastText embeddings. We also find that pre-training is beneficial across all 11 languages tested; however, large model sizes (more than 100 million parameters) make it computationally expensive to train separate models for each language. To address this shortcoming, we show that joint multilingual pre-training and fine-tuning allows sharing all but a small number of parameters between ten languages in the final model. The 10x reduction in model size compared to fine-tuning one model per language causes only a 3.2% relative error increase in aggregate. We further explore the idea of joint fine-tuning and show that it gives low-resource languages a way to benefit from the larger datasets of other languages. Finally, we demonstrate new state-of-the-art results for 11 languages, including English (95.8 F1) and Chinese (91.8 F1).

Anthology ID:: P19-1340
Volume:: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:: July
Year:: 2019
Address:: Florence, Italy
Editors:: Anna Korhonen, David Traum, Lluís Màrquez
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3499–3505
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/P19-1340/
DOI:: 10.18653/v1/P19-1340
Bibkey:
Cite (ACL):: Nikita Kitaev, Steven Cao, and Dan Klein. 2019. Multilingual Constituency Parsing with Self-Attention and Pre-Training. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3499–3505, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: Multilingual Constituency Parsing with Self-Attention and Pre-Training (Kitaev et al., ACL 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/P19-1340.pdf
Video:: https://preview.aclanthology.org/fix-sig-urls/P19-1340.mp4
Code: nikitakit/self-attentive-parser + additional community code

PDF Cite Search Code Video Fix data