Senka Drobac


Sub-label dependencies for Neural Morphological Tagging – The Joint Submission of University of Colorado and University of Helsinki for VarDial 2018
Miikka Silfverberg | Senka Drobac
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

This paper presents the submission of the UH&CU team (Joint University of Colorado and University of Helsinki team) for the VarDial 2018 shared task on morphosyntactic tagging of Croatian, Slovenian and Serbian tweets. Our system is a bidirectional LSTM tagger which emits tags as character sequences using an LSTM generator in order to be able to handle unknown tags and combinations of several tags for one token which occur in the shared task data sets. To the best of our knowledge, using an LSTM generator is a novel approach. The system delivers sizable improvements of more than 6%-points over a baseline trigram tagger. Overall, the performance of our system is quite even for all three languages.


OCR and post-correction of historical Finnish texts
Senka Drobac | Pekka Kauppinen | Krister Lindén
Proceedings of the 21st Nordic Conference on Computational Linguistics


Automated Lossless Hyper-Minimization for Morphological Analyzers
Senka Drobac | Miikka Silfverberg | Krister Lindén
Proceedings of the 12th International Conference on Finite-State Methods and Natural Language Processing 2015 (FSMNLP 2015 Düsseldorf)


Heuristic Hyper-minimization of Finite State Lexicons
Senka Drobac | Krister Lindén | Tommi Pirinen | Miikka Silfverberg
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Flag diacritics, which are special multi-character symbols executed at runtime, enable optimising finite-state networks by combining identical sub-graphs of its transition graph. Traditionally, the feature has required linguists to devise the optimisations to the graph by hand alongside the morphological description. In this paper, we present a novel method for discovering flag positions in morphological lexicons automatically, based on the morpheme structure implicit in the language description. With this approach, we have gained significant decrease in the size of finite-state networks while maintaining reasonable application speed. The algorithm can be applied to any language description, where the biggest achievements are expected in large and complex morphologies. The most noticeable reduction in size we got with a morphological transducer for Greenlandic, whose original size is on average about 15 times larger than other morphologies. With the presented hyper-minimization method, the transducer is reduced to 10,1% of the original size, with lookup speed decreased only by 9,5%.


Implementation of Replace Rules Using Preference Operator
Senka Drobac | Miikka Silfverberg | Anssi Yli-Jyrä
Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing