Semi-supervised Domain Adaptation for Dependency Parsing

Zhenghua Li, Xue Peng, Min Zhang, Rui Wang, Luo Si


Abstract
During the past decades, due to the lack of sufficient labeled data, most studies on cross-domain parsing focus on unsupervised domain adaptation, assuming there is no target-domain training data. However, unsupervised approaches make limited progress so far due to the intrinsic difficulty of both domain adaptation and parsing. This paper tackles the semi-supervised domain adaptation problem for Chinese dependency parsing, based on two newly-annotated large-scale domain-aware datasets. We propose a simple domain embedding approach to merge the source- and target-domain training data, which is shown to be more effective than both direct corpus concatenation and multi-task learning. In order to utilize unlabeled target-domain data, we employ the recent contextualized word representations and show that a simple fine-tuning procedure can further boost cross-domain parsing accuracy by large margin.
Anthology ID:
P19-1229
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Anna Korhonen, David Traum, Lluís Màrquez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2386–2395
Language:
URL:
https://aclanthology.org/P19-1229
DOI:
10.18653/v1/P19-1229
Bibkey:
Cite (ACL):
Zhenghua Li, Xue Peng, Min Zhang, Rui Wang, and Luo Si. 2019. Semi-supervised Domain Adaptation for Dependency Parsing. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2386–2395, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Semi-supervised Domain Adaptation for Dependency Parsing (Li et al., ACL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-dup-bibkey/P19-1229.pdf
Code
 SUDA-LA/ACL2019-dp-cross-domain