Diego Roca
2026
Sequence Labeling for Constituent Parsing: A Comparative Study and Encoding Innovations
Diego Roca | David Vilares | Carlos Gómez-Rodríguez
Computational Linguistics, Volume 52, Issue 2 - June 2026
Diego Roca | David Vilares | Carlos Gómez-Rodríguez
Computational Linguistics, Volume 52, Issue 2 - June 2026
Various encodings have been proposed to cast constituent parsing in terms of a sequence labeling task. However, unlike in the case of dependency parsing, existing comparisons have not been entirely homogeneous and, to the best of our knowledge, there is no systematic evaluation of these encodings under uniform configurations. A homogeneous evaluation needs to account for various aspects that could influence results, either by controlling for these aspects to ensure uniformity (e.g., network architecture, parameter settings, postprocessing of ill-formed output), or by systematically analyzing their impact (e.g., the impact of binary versus arbitrary structures). In this article, we: (1) compare different encodings comprehensively both theoretically and empirically, on a modern neural architecture and across nine languages, and (2) introduce new encodings and variants, including an encoding that our analysis finds particularly accurate and compact.
2023
4 and 7-bit Labeling for Projective and Non-Projective Dependency Trees
Carlos Gómez-Rodríguez | Diego Roca | David Vilares
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Carlos Gómez-Rodríguez | Diego Roca | David Vilares
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
We introduce an encoding for parsing as sequence labeling that can represent any projective dependency tree as a sequence of 4-bit labels, one per word. The bits in each word’s label represent (1) whether it is a right or left dependent, (2) whether it is the outermost (left/right) dependent of its parent, (3) whether it has any left children and (4) whether it has any right children. We show that this provides an injective mapping from trees to labels that can be encoded and decoded in linear time. We then define a 7-bit extension that represents an extra plane of arcs, extending the coverage to almost full non-projectivity (over 99.9% empirical arc coverage). Results on a set of diverse treebanks show that our 7-bit encoding obtains substantial accuracy gains over the previously best-performing sequence labeling encodings.