Portho: A Corpus-Based Resource of Orthographic Neighbors in European Portuguese

Eugénio Ribeiro, David Antunes, Nuno Mamede, Jorge Baptista


Abstract
Orthographic neighbors (ONs) play a central role in models of visual word recognition and have been shown to influence reading speed, lexical access, and literacy development. Despite their importance, resources providing detailed and flexible ON information remain scarce for European Portuguese. This paper introduces Portho, a corpus-based lexical resource that provides multiple ON metrics for over 43,000 word forms, using several ON definitions. In addition to classical neighborhood size measures, Portho provides frequency-based statistics and graded orthographic distance (OD) features. We analyze the statistical properties of the resource and evaluate its empirical utility in automatic text complexity assessment using the iRead4Skills corpus. Results show that while ON features alone are insufficient to predict readability, they contribute complementary information and compare favorably with existing resources for Portuguese. Portho is made publicly available in different formats to support research in psycholinguistics, readability modeling, and Natural Language Processing (NLP) for Portuguese.
Anthology ID:
2026.propor-1.40
Volume:
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Month:
April
Year:
2026
Address:
Salvador, Brazil
Editors:
Marlo Souza, Iria de-Dios-Flores, Diana Santos, Larissa Freitas, Jackson Wilke da Cruz Souza, Eugénio Ribeiro
Venue:
PROPOR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
406–415
Language:
URL:
https://preview.aclanthology.org/ingest-dnd/2026.propor-1.40/
DOI:
Bibkey:
Cite (ACL):
Eugénio Ribeiro, David Antunes, Nuno Mamede, and Jorge Baptista. 2026. Portho: A Corpus-Based Resource of Orthographic Neighbors in European Portuguese. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1, pages 406–415, Salvador, Brazil. Association for Computational Linguistics.
Cite (Informal):
Portho: A Corpus-Based Resource of Orthographic Neighbors in European Portuguese (Ribeiro et al., PROPOR 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-dnd/2026.propor-1.40.pdf