Matthew Baerman


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2024

pdf bib
Eesthetic: A Paralex Lexicon of Estonian Paradigms
Sacha Beniamine | Mari Aigro | Matthew Baerman | Jules Bouton | Maria Copot
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We introduce Eesthetic, a comprehensive Estonian noun and verb lexicon sourced from the Ekilex database. It documents 5475 nouns inflecting for 28 paradigm cells and 5076 verbs inflecting for 51 cells, and comprises a total of 452885 inflected forms. Our openly accessible machine-readable dataset adheres to the Paralex standard. It comprises CSV tables linked by formal relationships. Metadata in JSON format, following the Frictionless standard, provides detailed descriptions of the tables and dataset. The lexicon offers extensive linguistic annotations, including orthographic forms, automatically transcribed phonemic transcriptions, non-canonical morphological phenomena such as overabundance and defectiveness, rich mapping of the paradigm cells and feature-values to other notation schemes, a decomposition of phonemes in distinctive features, and annotation of inflection classes. It is suited for both monolingual and comparative research, enabling qualitative and quantitative analysis. This paper outlines the creation process, rationale, and resulting structure, along with our set of rules for automatic orthography-to-phonemic transcription conversion.