Principal Parts Detection for Computational Morphology: Task, Models and Benchmark

Dorin Keshales; Omer Goldman; Reut Tsarfaty

Principal Parts Detection for Computational Morphology: Task, Models and Benchmark

Dorin Keshales, Omer Goldman, Reut Tsarfaty

Abstract

Principal parts of an inflectional paradigm, defined as the minimal set of paradigm cells required to deduce all others, constitute an important concept in theoretical morphology. This concept, which outlines the minimal memorization needed for a perfect inflector, has been largely overlooked in computational morphology despite impressive advances in the field over the last decade. In this work, we posit Principal Parts Detection as a computational task and construct a multilingual dataset of verbal principal parts covering ten languages, based on Wiktionary entries. We evaluate an array of Principal Parts Detection methods, all of which follow the same schema: characterize the relationships between each pair of inflectional categories, cluster the resulting vector representations, and select a representative of each cluster as a predicted principal part. Our best-performing model, based on Edit Script between inflections and using Hierarchical K-Means, achieves an F1 score of 55.05%, significantly outperforming a random baseline of 21.20%. While our results demonstrate that some success is achievable, further work is needed to thoroughly solve Principal Parts Detection, a task that may be used to further optimize inputs for morphological inflection, and to promote research into the theoretical and practical importance of a compact representation of morphological paradigms.

Anthology ID:: 2025.conll-1.17
Volume:: Proceedings of the 29th Conference on Computational Natural Language Learning
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Gemma Boleda, Michael Roth
Venues:: CoNLL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 251–267
Language:
URL:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.conll-1.17/
DOI:
Bibkey:
Cite (ACL):: Dorin Keshales, Omer Goldman, and Reut Tsarfaty. 2025. Principal Parts Detection for Computational Morphology: Task, Models and Benchmark. In Proceedings of the 29th Conference on Computational Natural Language Learning, pages 251–267, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Principal Parts Detection for Computational Morphology: Task, Models and Benchmark (Keshales et al., CoNLL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.conll-1.17.pdf

PDF Cite Search Fix data