Abstract
We compare two orthogonal semi-supervised learning techniques, namely tri-training and pretrained word embeddings, in the task of dependency parsing. We explore language-specific FastText and ELMo embeddings and multilingual BERT embeddings. We focus on a low resource scenario as semi-supervised learning can be expected to have the most impact here. Based on treebank size and available ELMo models, we select Hungarian, Uyghur (a zero-shot language for mBERT) and Vietnamese. Furthermore, we include English in a simulated low-resource setting. We find that pretrained word embeddings make more effective use of unlabelled data than tri-training but that the two approaches can be successfully combined.- Anthology ID:
- 2021.emnlp-main.745
- Volume:
- Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 9457–9473
- Language:
- URL:
- https://aclanthology.org/2021.emnlp-main.745
- DOI:
- 10.18653/v1/2021.emnlp-main.745
- Cite (ACL):
- Joachim Wagner and Jennifer Foster. 2021. Revisiting Tri-training of Dependency Parsers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9457–9473, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Revisiting Tri-training of Dependency Parsers (Wagner & Foster, EMNLP 2021)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2021.emnlp-main.745.pdf
- Code
- jowagner/mtb-tri-training + additional community code