Automatic Verb Classifier for Abui (AVC-abz)
Frantisek Kratochvil, George Saad, Jiří Vomlel, Václav Kratochvíl
Abstract
We present an automatic verb classifier system that identifies inflectional classes in Abui (AVC-abz), a Papuan language of the Timor-Alor-Pantar family. The system combines manually annotated language data (the learning set) with the output of a morphological precision grammar (corpus data). The morphological precision grammar is trained on a fully glossed smaller corpus and applied to a larger corpus. Using the k-means algorithm, the system clusters inflectional classes discovered in the learning set. In the second step, Naive Bayes algorithm assigns the verbs found in the corpus data to the best-fitting cluster. AVC-abz serves to advance and refine the grammatical analysis of Abui as well as to monitor corpus coverage and its gradual improvement.- Anthology ID:
- 2022.eurali-1.7
- Volume:
- Proceedings of the Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia within the 13th Language Resources and Evaluation Conference
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Atul Kr. Ojha, Sina Ahmadi, Chao-Hong Liu, John P. McCrae
- Venue:
- EURALI
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 42–50
- Language:
- URL:
- https://aclanthology.org/2022.eurali-1.7
- DOI:
- Cite (ACL):
- Frantisek Kratochvil, George Saad, Jiří Vomlel, and Václav Kratochvíl. 2022. Automatic Verb Classifier for Abui (AVC-abz). In Proceedings of the Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia within the 13th Language Resources and Evaluation Conference, pages 42–50, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Automatic Verb Classifier for Abui (AVC-abz) (Kratochvil et al., EURALI 2022)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2022.eurali-1.7.pdf
- Code
- fanacek/avc-abz