Tatjana Pakalne


2025

pdf bib
Database of Latvian Morphemes and Derivational Models: ideas and expected results
Andra Kalnača | Tatjana Pakalne | Kristīne Levāne-Petrova
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

In this paper, we describe “The Database of Latvian Morphemes and Derivational Models” – a large-scale corpus-based and manually validated database of Latvian derivational morphology currently in development at the University of Latvia. The database contains morpheme-level data – morphemes, incl. morpheme variants (allomorphs), morpheme types, morpheme homonymy/ homography resolu- tion, hierarchical relations between root morphemes, links to word families, and lemma-level data – incl. base form, morphemic segmentation, POS, grammatical features, derivational motivation (incl. compounding), word-family membership. The focus of the database is on providing linguistically accurate comprehensive data as a reliable basis for future work in different fields.