Abstract
Most NLP resources that offer annotations at the word segment level provide morphological annotation that includes features indicating tense, aspect, modality, gender, case, and other inflectional information. Such information is rarely aligned to the relevant parts of the words—i.e. the allomorphs, as such annotation would be very costly. These unaligned weak labelings are commonly provided by annotated NLP corpora such as treebanks in various languages. Although they lack alignment information, the presence/absence of labels at the word level is also consistent with the amount of supervision assumed to be provided to L1 and L2 learners. In this paper, we explore several methods to learn this latent alignment between parts of word forms and the grammatical information provided. All the methods under investigation favor hypotheses regarding allomorphs of morphemes that re-use a small inventory, i.e. implicitly minimize the number of allomorphs that a morpheme can be realized as. We show that the provided information offers a significant advantage for both word segmentation and the learning of allomorphy.- Anthology ID:
- W17-4107
- Volume:
- Proceedings of the First Workshop on Subword and Character Level Models in NLP
- Month:
- September
- Year:
- 2017
- Address:
- Copenhagen, Denmark
- Venue:
- SCLeM
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 46–56
- Language:
- URL:
- https://aclanthology.org/W17-4107
- DOI:
- 10.18653/v1/W17-4107
- Cite (ACL):
- Miikka Silfverberg and Mans Hulden. 2017. Weakly supervised learning of allomorphy. In Proceedings of the First Workshop on Subword and Character Level Models in NLP, pages 46–56, Copenhagen, Denmark. Association for Computational Linguistics.
- Cite (Informal):
- Weakly supervised learning of allomorphy (Silfverberg & Hulden, SCLeM 2017)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/W17-4107.pdf