Abstract
This paper describes a workflow to impute missing values in a typological database, a sub- set of the World Atlas of Language Structures (WALS). Using a world-wide phylogeny de- rived from lexical data, the model assumes a phylogenetic continuous time Markov chain governing the evolution of typological val- ues. Data imputation is performed via a Max- imum Likelihood estimation on the basis of this model. As back-off model for languages whose phylogenetic position is unknown, a k- nearest neighbor classification based on geo- graphic distance is performed.- Anthology ID:
- 2020.sigtyp-1.5
- Volume:
- Proceedings of the Second Workshop on Computational Research in Linguistic Typology
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- SIGTYP
- SIG:
- SIGTYP
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 36–42
- Language:
- URL:
- https://aclanthology.org/2020.sigtyp-1.5
- DOI:
- 10.18653/v1/2020.sigtyp-1.5
- Cite (ACL):
- Gerhard Jäger. 2020. Imputing typological values via phylogenetic inference. In Proceedings of the Second Workshop on Computational Research in Linguistic Typology, pages 36–42, Online. Association for Computational Linguistics.
- Cite (Informal):
- Imputing typological values via phylogenetic inference (Jäger, SIGTYP 2020)
- PDF:
- https://preview.aclanthology.org/starsem-semeval-split/2020.sigtyp-1.5.pdf
- Code
- gerhardjaeger/emnlp2020