2022
pdf
abs
SIGMORPHON 2022 Shared Task on Morpheme Segmentation Submission Description: Sequence Labelling for Word-Level Morpheme Segmentation
Leander Girrbach
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
We propose a sequence labelling approach to word-level morpheme segmentation. Segmentation labels are edit operations derived from a modified minimum edit distance alignment. We show that sequence labelling performs well for “shallow segmentation” and “canonical segmentation”, achieving 96.06 f1 score (macroaveraged over all languages in the shared task) and ranking 3rd among all participating teams. Therefore, we conclude that sequence labelling is a promising approach to morpheme segmentation.
pdf
abs
SIGMORPHON 2022 Task 0 Submission Description: Modelling Morphological Inflection with Data-Driven and Rule-Based Approaches
Tatiana Merzhevich
|
Nkonye Gbadegoye
|
Leander Girrbach
|
Jingwen Li
|
Ryan Soh-Eun Shim
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
This paper describes our participation in the 2022 SIGMORPHON-UniMorph Shared Task on Typologically Diverse and AcquisitionInspired Morphological Inflection Generation. We present two approaches: one being a modification of the neural baseline encoderdecoder model, the other being hand-coded morphological analyzers using finite-state tools (FST) and outside linguistic knowledge. While our proposed modification of the baseline encoder-decoder model underperforms the baseline for almost all languages, the FST methods outperform other systems in the respective languages by a large margin. This confirms that purely data-driven approaches have not yet reached the maturity to replace trained linguists for documentation and analysis especially considering low-resource and endangered languages.
pdf
abs
Text Complexity DE Challenge 2022 Submission Description: Pairwise Regression for Complexity Prediction
Leander Girrbach
Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text
This paper describes our submission to the Text Complexity DE Challenge 2022 (Mohtaj et al., 2022). We evaluate a pairwise regression model that predicts the relative difference in complexity of two sentences, instead of predicting a complexity score from a single sentence. In consequence, the model returns samples of scores (as many as there are training sentences) instead of a point estimate. Due to an error in the submission, test set results are unavailable. However, we show by cross-validation that pairwise regression does not improve performance over standard regression models using sentence embeddings taken from pretrained language models as input. Furthermore, we do not find the distribution standard deviations to reflect differences in “uncertainty” of the model predictions in an useful way.