Similarity Dependent Chinese Restaurant Process for Cognate Identification in Multilingual Wordlists
Abstract
We present and evaluate two similarity dependent Chinese Restaurant Process (sd-CRP) algorithms at the task of automated cognate detection. The sd-CRP clustering algorithms do not require any predefined threshold for detecting cognate sets in a multilingual word list. We evaluate the performance of the algorithms on six language families (more than 750 languages) and find that both the sd-CRP variants performs as well as InfoMap and better than UPGMA at the task of inferring cognate clusters. The algorithms presented in this paper are family agnostic and can be applied to any linguistically under-studied language family.- Anthology ID:
- K18-1027
- Volume:
- Proceedings of the 22nd Conference on Computational Natural Language Learning
- Month:
- October
- Year:
- 2018
- Address:
- Brussels, Belgium
- Editors:
- Anna Korhonen, Ivan Titov
- Venue:
- CoNLL
- SIG:
- SIGNLL
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 271–281
- Language:
- URL:
- https://aclanthology.org/K18-1027
- DOI:
- 10.18653/v1/K18-1027
- Cite (ACL):
- Taraka Rama. 2018. Similarity Dependent Chinese Restaurant Process for Cognate Identification in Multilingual Wordlists. In Proceedings of the 22nd Conference on Computational Natural Language Learning, pages 271–281, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- Similarity Dependent Chinese Restaurant Process for Cognate Identification in Multilingual Wordlists (Rama, CoNLL 2018)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/K18-1027.pdf