Explicit Edge Length Coding to Improve Long Sentence Parsing Performance

Khensa Daoudi; Mathieu Dehouck; Rayan Ziane; Natasha Romanova

Explicit Edge Length Coding to Improve Long Sentence Parsing Performance

Khensa Daoudi, Mathieu Dehouck, Rayan Ziane, Natasha Romanova

Abstract

Performance of syntactic parsers is reduced for longer sentences. While some of this reduction can be explained by the tendency of longer sentences to be more syntactically complex as well as the increase of candidate governor number, some of it is due to longer sentences being more challenging to encode. This is especially relevant for low-resource scenarios such as parsing of written sources in historical languages (e.g. medieval and early-modern European languages), in particular legal texts, where sentences can be very long whereas the amount of training material remains limited. In this paper, we present a new method for explicitly using the arc length information in order to bias the scores produced by a graph-based parser. With a series of experiments on Norman and Gascon data, in which we divide the test data according to sentence length, we show that indeed explicit length coding is beneficial to retain parsing performance for longer sentences.

Anthology ID:: 2025.lowresnlp-1.11
Volume:: Proceedings of the First Workshop on Advancing NLP for Low-Resource Languages
Month:: September
Year:: 2025
Address:: Varna, Bulgaria
Editors:: Ernesto Luis Estevanell-Valladares, Alicia Picazo-Izquierdo, Tharindu Ranasinghe, Besik Mikaberidze, Simon Ostermann, Daniil Gurgurov, Philipp Mueller, Claudia Borg, Marián Šimko
Venues:: LowResNLP | WS
SIG:
Publisher:: INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:: 102–110
Language:
URL:: https://preview.aclanthology.org/corrections-2026-01/2025.lowresnlp-1.11/
DOI:
Bibkey:
Cite (ACL):: Khensa Daoudi, Mathieu Dehouck, Rayan Ziane, and Natasha Romanova. 2025. Explicit Edge Length Coding to Improve Long Sentence Parsing Performance. In Proceedings of the First Workshop on Advancing NLP for Low-Resource Languages, pages 102–110, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):: Explicit Edge Length Coding to Improve Long Sentence Parsing Performance (Daoudi et al., LowResNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/corrections-2026-01/2025.lowresnlp-1.11.pdf

PDF Cite Search Fix data