Are Translated Texts Useful for Gradient Word Order Extraction?F

Amanda Kann


Abstract
Gradient, token-level measures of word order preferences within a language are useful both for cross-linguistic comparison in linguistic typology and for multilingual NLP applications. However, such measures might not be representative of general language use when extracted from translated corpora, due to noise introduced by structural effects of translation. We attempt to quantify this uncertainty in a case study of subject/verb order statistics extracted from a parallel corpus of parliamentary speeches in 21 European languages. We find that word order proportions in translated texts generally resemble those extracted from non-translated texts, but tend to skew somewhat toward the dominant word order of the target language. We also investigate the potential presence of underlying source language-specific effects, but find that they do not sufficiently explain the variation across translations.
Anthology ID:
2025.sigtyp-1.17
Volume:
Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:
August
Year:
2025
Address:
Vinenna. Austria
Editors:
Michael Hahn, Priya Rani, Ritesh Kumar, Andreas Shcherbakov, Alexey Sorokin, Oleg Serikov, Ryan Cotterell, Ekaterina Vylomova
Venues:
SIGTYP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
177–182
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.sigtyp-1.17/
DOI:
Bibkey:
Cite (ACL):
Amanda Kann. 2025. Are Translated Texts Useful for Gradient Word Order Extraction?F. In Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 177–182, Vinenna. Austria. Association for Computational Linguistics.
Cite (Informal):
Are Translated Texts Useful for Gradient Word Order Extraction?F (Kann, SIGTYP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.sigtyp-1.17.pdf