Alberto Muñoz-Ortiz


2021

pdf bib
Not All Linearizations Are Equally Data-Hungry in Sequence Labeling Parsing
Alberto Muñoz-Ortiz | Michalina Strzyz | David Vilares
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Different linearizations have been proposed to cast dependency parsing as sequence labeling and solve the task as: (i) a head selection problem, (ii) finding a representation of the token arcs as bracket strings, or (iii) associating partial transition sequences of a transition-based parser to words. Yet, there is little understanding about how these linearizations behave in low-resource setups. Here, we first study their data efficiency, simulating data-restricted setups from a diverse set of rich-resource treebanks. Second, we test whether such differences manifest in truly low-resource setups. The results show that head selection encodings are more data-efficient and perform better in an ideal (gold) framework, but that such advantage greatly vanishes in favour of bracketing formats when the running setup resembles a real-world low-resource configuration.