Challenges to Open-Domain Constituency Parsing

Sen Yang; Leyang Cui; Ruoxi Ning; Di Wu; Yue Zhang

doi:10.18653/v1/2022.findings-acl.11

Challenges to Open-Domain Constituency Parsing

Sen Yang, Leyang Cui, Ruoxi Ning, Di Wu, Yue Zhang

Abstract

Neural constituency parsers have reached practical performance on news-domain benchmarks. However, their generalization ability to other domains remains weak. Existing findings on cross-domain constituency parsing are only made on a limited number of domains. Tracking this, we manually annotate a high-quality constituency treebank containing five domains. We analyze challenges to open-domain constituency parsing using a set of linguistic features on various strong constituency parsers. Primarily, we find that 1) BERT significantly increases parsers’ cross-domain performance by reducing their sensitivity on the domain-variant features.2) Compared with single metrics such as unigram distribution and OOV rate, challenges to open-domain constituency parsing arise from complex features, including cross-domain lexical and constituent structure variations.

Anthology ID:: 2022.findings-acl.11
Volume:: Findings of the Association for Computational Linguistics: ACL 2022
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 112–127
Language:
URL:: https://aclanthology.org/2022.findings-acl.11
DOI:: 10.18653/v1/2022.findings-acl.11
Bibkey:
Cite (ACL):: Sen Yang, Leyang Cui, Ruoxi Ning, Di Wu, and Yue Zhang. 2022. Challenges to Open-Domain Constituency Parsing. In Findings of the Association for Computational Linguistics: ACL 2022, pages 112–127, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Challenges to Open-Domain Constituency Parsing (Yang et al., Findings 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-bitext-workshop/2022.findings-acl.11.pdf
Code: ringos/multi-domain-parsing-analysis + additional community code
Data: Penn Treebank

PDF Search Code