Lemmas Matter, But Not Like That: Predictors of Lemma-Based Generalization in Morphological Inflection

Sarah Ruth Brogden Payne, Jordan Kodner


Abstract
Recent work has suggested that overlap –whether a given lemma or feature set is attested independently in train – drives model performance on morphological inflection tasks. The impact of lemma overlap, however, is debated, with recent work reporting accuracy drops from 0% to 30% between seen and unseen test lemmas. In this paper, we introduce a novel splitting algorithm designed to investigate predictors of accuracy on seen and unseen lemmas. We find only an 11% average drop from seen to unseen test lemmas, but show that the number of lemmas in train has a much stronger effect on accuracy on unseen than seen lemmas. We also show that the previously reported 30% drop is inflated due to the introduction of a near-30% drop in the number of training lemmas from the original splits to their novel splits.
Anthology ID:
2025.findings-acl.1296
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
25270–25286
Language:
URL:
https://preview.aclanthology.org/display_plenaries/2025.findings-acl.1296/
DOI:
Bibkey:
Cite (ACL):
Sarah Ruth Brogden Payne and Jordan Kodner. 2025. Lemmas Matter, But Not Like That: Predictors of Lemma-Based Generalization in Morphological Inflection. In Findings of the Association for Computational Linguistics: ACL 2025, pages 25270–25286, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Lemmas Matter, But Not Like That: Predictors of Lemma-Based Generalization in Morphological Inflection (Payne & Kodner, Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/display_plenaries/2025.findings-acl.1296.pdf