Sequence Labeling for Constituent Parsing: A Comparative Study and Encoding Innovations

Diego Roca, David Vilares, Carlos Gómez-Rodríguez


Abstract
Various encodings have been proposed to cast constituent parsing in terms of a sequence labeling task. However, unlike in the case of dependency parsing, existing comparisons have not been entirely homogeneous and, to the best of our knowledge, there is no systematic evaluation of these encodings under uniform configurations. A homogeneous evaluation needs to account for various aspects that could influence results, either by controlling for these aspects to ensure uniformity (e.g., network architecture, parameter settings, postprocessing of ill-formed output), or by systematically analyzing their impact (e.g., the impact of binary versus arbitrary structures). In this article, we: (1) compare different encodings comprehensively both theoretically and empirically, on a modern neural architecture and across nine languages, and (2) introduce new encodings and variants, including an encoding that our analysis finds particularly accurate and compact.
Anthology ID:
2026.cl-2.3
Volume:
Computational Linguistics, Volume 52, Issue 2 - June 2026
Month:
June
Year:
2026
Address:
Cambridge, MA
Venue:
CL
SIG:
Publisher:
MIT Press
Note:
Pages:
495–539
Language:
URL:
https://preview.aclanthology.org/codex___ingest-cl-2026-issue-2/2026.cl-2.3/
DOI:
10.1162/coli.a.603
Bibkey:
Cite (ACL):
Diego Roca, David Vilares, and Carlos Gómez-Rodríguez. 2026. Sequence Labeling for Constituent Parsing: A Comparative Study and Encoding Innovations. Computational Linguistics, 52(2):495–539.
Cite (Informal):
Sequence Labeling for Constituent Parsing: A Comparative Study and Encoding Innovations (Roca et al., CL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/codex___ingest-cl-2026-issue-2/2026.cl-2.3.pdf