Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples

Vidur Joshi, Matthew Peters, Mark Hopkins


Abstract
We revisit domain adaptation for parsers in the neural era. First we show that recent advances in word representations greatly diminish the need for domain adaptation when the target domain is syntactically similar to the source domain. As evidence, we train a parser on the Wall Street Journal alone that achieves over 90% F1 on the Brown corpus. For more syntactically distant domains, we provide a simple way to adapt a parser using only dozens of partial annotations. For instance, we increase the percentage of error-free geometry-domain parses in a held-out set from 45% to 73% using approximately five dozen training examples. In the process, we demonstrate a new state-of-the-art single model result on the Wall Street Journal test set of 94.3%. This is an absolute increase of 1.7% over the previous state-of-the-art of 92.6%.
Anthology ID:
P18-1110
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Iryna Gurevych, Yusuke Miyao
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1190–1199
Language:
URL:
https://aclanthology.org/P18-1110
DOI:
10.18653/v1/P18-1110
Bibkey:
Cite (ACL):
Vidur Joshi, Matthew Peters, and Mark Hopkins. 2018. Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1190–1199, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples (Joshi et al., ACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/P18-1110.pdf
Presentation:
 P18-1110.Presentation.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-3/P18-1110.mp4
Code
 vidurj/parser-adaptation
Data
Penn Treebank