Span-based discontinuous constituency parsing: a family of exact chart-based algorithms with time complexities from O(nˆ6) down to O(nˆ3)

Caio Corro


Abstract
We introduce a novel chart-based algorithm for span-based parsing of discontinuous constituency trees of block degree two, including ill-nested structures. In particular, we show that we can build variants of our parser with smaller search spaces and time complexities ranging from O(nˆ6) down to O(nˆ3). The cubic time variant covers 98% of constituents observed in linguistic treebanks while having the same complexity as continuous constituency parsers. We evaluate our approach on German and English treebanks (Negra, Tiger, and DPTB) and report state-of-the-art results in the fully supervised setting. We also experiment with pre-trained word embeddings and Bert-based neural networks.
Anthology ID:
2020.emnlp-main.219
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2753–2764
Language:
URL:
https://aclanthology.org/2020.emnlp-main.219
DOI:
10.18653/v1/2020.emnlp-main.219
Bibkey:
Cite (ACL):
Caio Corro. 2020. Span-based discontinuous constituency parsing: a family of exact chart-based algorithms with time complexities from O(nˆ6) down to O(nˆ3). In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2753–2764, Online. Association for Computational Linguistics.
Cite (Informal):
Span-based discontinuous constituency parsing: a family of exact chart-based algorithms with time complexities from O(nˆ6) down to O(nˆ3) (Corro, EMNLP 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/paclic-22-ingestion/2020.emnlp-main.219.pdf
Optional supplementary material:
 2020.emnlp-main.219.OptionalSupplementaryMaterial.pdf
Video:
 https://slideslive.com/38938664