MorfFlex: Handling Rich Morphology

Jaroslava Hlaváčová, Marie Mikulová, Barbora Štěpánková, Milan Straka, Jan Hajič


Abstract
We present MorfFlex, a morphological dictionary architecture suitable for languages with extensive regularity in both inflection and derivation. As the primary example of MorfFlex in use we introduce MorfFlex CZ, a morphological dictionary of Czech. It is distributed as a simple, unstructured list of <wordform, lemma, tag> triplets, however, its manually maintained, unpublished source files and conversion scripts encode a sophisticated system of inflectional and derivational patterns. These patterns dramatically reduce the otherwise enormous size of the dictionary, which currently contains over 100 million wordforms and more than 1 million lemmas. The MorfFlex CZ dictionary serves as an essential resource for ensuring the consistency of manual morphological annotation in the Prague Dependency Treebanks and underpins state-of-the-art automatic tools such as MorphoDiTa. In this paper, we focus on: (i) presenting an effective method for managing the rich morphological system within the dictionary, and (ii) demonstrating the utility of such a language resource for maintaining annotation consistency in corpora and supporting the development of advanced NLP applications.
Anthology ID:
2026.lrec-main.899
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
11495–11505
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.899/
DOI:
Bibkey:
Cite (ACL):
Jaroslava Hlaváčová, Marie Mikulová, Barbora Štěpánková, Milan Straka, and Jan Hajič. 2026. MorfFlex: Handling Rich Morphology. International Conference on Language Resources and Evaluation, main:11495–11505.
Cite (Informal):
MorfFlex: Handling Rich Morphology (Hlaváčová et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.899.pdf