@inproceedings{kosyak-tyers-2022-predictive,
    title = "Predictive Text for Agglutinative and Polysynthetic Languages",
    author = "Kosyak, Sergey  and
      Tyers, Francis",
    editor = "Serikov, Oleg  and
      Voloshina, Ekaterina  and
      Postnikova, Anna  and
      Klyachko, Elena  and
      Neminova, Ekaterina  and
      Vylomova, Ekaterina  and
      Shavrina, Tatiana  and
      Ferrand, Eric Le  and
      Malykh, Valentin  and
      Tyers, Francis  and
      Arkhangelskiy, Timofey  and
      Mikhailov, Vladislav  and
      Fenogenova, Alena",
    booktitle = "Proceedings of the First Workshop on NLP applications to field linguistics",
    month = oct,
    year = "2022",
    address = "Gyeongju, Republic of Korea",
    publisher = "International Conference on Computational Linguistics",
    url = "https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2022.fieldmatters-1.9/",
    pages = "77--85",
    abstract = "This paper presents a set of experiments in the area of morphological modelling and prediction. We test whether morphological segmentation can compete against statistical segmentation in the tasks of language modelling and predictive text entry for two under-resourced and indigenous languages, K{'}iche' and Chukchi. We use different segmentation methods {---} both statistical and morphological {---} to make datasets that are used to train models of different types: single-way segmented, which are trained using data from one segmenter; two-way segmented, which are trained using concatenated data from two segmenters; and finetuned, which are trained on two datasets from different segmenters. We compute word and character level perplexities and find that single-way segmented models trained on morphologically segmented data show the highest performance. Finally, we evaluate the language models on the task of predictive text entry using gold standard data and measure the average number of clicks per character and keystroke savings rate. We find that the models trained on morphologically segmented data show better scores, although with substantial room for improvement. At last, we propose the usage of morphological segmentation in order to improve the end-user experience while using predictive text and we plan on testing this assumption by doing end-user evaluation."
}Markdown (Informal)
[Predictive Text for Agglutinative and Polysynthetic Languages](https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2022.fieldmatters-1.9/) (Kosyak & Tyers, FieldMatters 2022)
ACL