Abstract
We present our submission to the SIGTYP 2020 Shared Task on the prediction of typological features. We submit a constrained system, predicting typological features only based on the WALS database. We investigate two approaches. The simpler of the two is a system based on estimating correlation of feature values within languages by computing conditional probabilities and mutual information. The second approach is to train a neural predictor operating on precomputed language embeddings based on WALS features. Our submitted system combines the two approaches based on their self-estimated confidence scores. We reach the accuracy of 70.7% on the test data and rank first in the shared task.- Anthology ID:
- 2020.sigtyp-1.4
- Volume:
- Proceedings of the Second Workshop on Computational Research in Linguistic Typology
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Ekaterina Vylomova, Edoardo M. Ponti, Eitan Grossman, Arya D. McCarthy, Yevgeni Berzak, Haim Dubossarsky, Ivan Vulić, Roi Reichart, Anna Korhonen, Ryan Cotterell
- Venue:
- SIGTYP
- SIG:
- SIGTYP
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 29–35
- Language:
- URL:
- https://aclanthology.org/2020.sigtyp-1.4
- DOI:
- 10.18653/v1/2020.sigtyp-1.4
- Cite (ACL):
- Martin Vastl, Daniel Zeman, and Rudolf Rosa. 2020. Predicting Typological Features in WALS using Language Embeddings and Conditional Probabilities: ÚFAL Submission to the SIGTYP 2020 Shared Task. In Proceedings of the Second Workshop on Computational Research in Linguistic Typology, pages 29–35, Online. Association for Computational Linguistics.
- Cite (Informal):
- Predicting Typological Features in WALS using Language Embeddings and Conditional Probabilities: ÚFAL Submission to the SIGTYP 2020 Shared Task (Vastl et al., SIGTYP 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2020.sigtyp-1.4.pdf
- Code
- ufal/ST2020