Adam Wiemerslage

2023

pdf abs
An Investigation of Noise in Morphological Inflection
Adam Wiemerslage | Changbing Yang | Garrett Nicolai | Miikka Silfverberg | Katharina Kann
Findings of the Association for Computational Linguistics: ACL 2023

With a growing focus on morphological inflection systems for languages where high-quality data is scarce, training data noise is a serious but so far largely ignored concern. We aim at closing this gap by investigating the types of noise encountered within a pipeline for truly unsupervised morphological paradigm completion and its impact on morphological inflection systems: First, we propose an error taxonomy and annotation pipeline for inflection training data. Then, we compare the effect of different types of noise on multiple state-of-the- art inflection models. Finally, we propose a novel character-level masked language modeling (CMLM) pretraining objective and explore its impact on the models’ resistance to noise. Our experiments show that various architectures are impacted differently by separate types of noise, but encoder-decoders tend to be more robust to noise than models trained with a copy bias. CMLM pretraining helps transformers, but has lower impact on LSTMs.

2022

pdf abs
A Comprehensive Comparison of Neural Networks as Cognitive Models of Inflection
Adam Wiemerslage | Shiran Dudy | Katharina Kann
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Neural networks have long been at the center of a debate around the cognitive mechanism by which humans process inflectional morphology. This debate has gravitated into NLP by way of the question: Are neural networks a feasible account for human behavior in morphological inflection?We address that question by measuring the correlation between human judgments and neural network probabilities for unknown word inflections. We test a larger range of architectures than previously studied on two important tasks for the cognitive processing debate: English past tense, and German number inflection. We find evidence that the Transformer may be a better account of human behavior than LSTMs on these datasets, and that LSTM features known to increase inflection accuracy do not always result in more human-like behavior.

Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages. Having long been multilingual, the field of computational morphology is increasingly moving towards approaches suitable for languages with minimal or no annotated resources. First, we survey recent developments in computational morphology with a focus on low-resource languages. Second, we argue that the field is ready to tackle the logical next challenge: understanding a language’s morphology from raw text alone. We perform an empirical study on a truly unsupervised version of the paradigm completion task and show that, while existing state-of-the-art models bridged by two newly proposed models we devise perform reasonably, there is still much room for improvement. The stakes are high: solving this task will increase the language coverage of morphological resources by a number of magnitudes.

2021

pdf abs
Findings of the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering
Adam Wiemerslage | Arya D. McCarthy | Alexander Erdmann | Garrett Nicolai | Manex Agirrezabal | Miikka Silfverberg | Mans Hulden | Katharina Kann
Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

We describe the second SIGMORPHON shared task on unsupervised morphology: the goal of the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering is to cluster word types from a raw text corpus into paradigms. To this end, we release corpora for 5 development and 9 test languages, as well as gold partial paradigms for evaluation. We receive 14 submissions from 4 teams that follow different strategies, and the best performing system is based on adaptor grammars. Results vary significantly across languages. However, all systems are outperformed by a supervised lemmatizer, implying that there is still room for improvement.

pdf abs
Paradigm Clustering with Weighted Edit Distance
Andrew Gerlach | Adam Wiemerslage | Katharina Kann
Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper describes our system for the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering, which asks participants to group inflected forms together according their underlying lemma without the aid of annotated training data. We employ agglomerative clustering to group word forms together using a metric that combines an orthographic distance and a semantic distance from word embeddings. We experiment with two variations of an edit distance-based model for quantifying orthographic distance, but, due to time constraints, our system does not improve over the shared task’s baseline system.

2020

Automated Essay Scoring (AES) can be used to automatically generate holistic scores with reliability comparable to human scoring. In addition, AES systems can provide formative feedback to learners, typically at the essay level. In contrast, we are interested in providing feedback specialized to the content of the essay, and specifically for the content areas required by the rubric. A key objective is that the feedback should be localized alongside the relevant essay text. An important step in this process is determining where in the essay the rubric designated points and topics are discussed. A natural approach to this task is to train a classifier using manually annotated data; however, collecting such data is extremely resource intensive. Instead, we propose a method to predict these annotation spans without requiring any labeled annotation data. Our approach is to consider AES as a Multiple Instance Learning (MIL) task. We show that such models can both predict content scores and localize content by leveraging their sentence-level score predictions. This capability arises despite never having access to annotation training data. Implications are discussed for improving formative feedback and explainable AES models.

2018

pdf
Morphological Reinflection in Context: CU Boulder’s Submission to CoNLL–SIGMORPHON 2018 Shared Task
Ling Liu | Ilamvazhuthy Subbiah | Adam Wiemerslage | Jonathan Lilley | Sarah Moeller
Proceedings of the CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection

pdf abs
Phonological Features for Morphological Inflection
Adam Wiemerslage | Miikka Silfverberg | Mans Hulden
Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology

Modeling morphological inflection is an important task in Natural Language Processing. In contrast to earlier work that has largely used orthographic representations, we experiment with this task in a phonetic character space, representing inputs as either IPA segments or bundles of phonological distinctive features. We show that both of these inputs, somewhat counterintuitively, achieve similar accuracies on morphological inflection, slightly lower than orthographic models. We conclude that providing detailed phonological representations is largely redundant when compared to IPA segments, and that articulatory distinctions relevant for word inflection are already latently present in the distributional properties of many graphemic writing systems.

2017

pdf
Data Augmentation for Morphological Reinflection
Miikka Silfverberg | Adam Wiemerslage | Ling Liu | Lingshuang Jack Mao
Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection