Abstract
This paper seeks to uncover patterns of sound change across Indo-Aryan languages using an LSTM encoder-decoder architecture. We augment our models with embeddings represent-ing language ID, part of speech, and other features such as word embeddings. We find that a highly augmented model shows highest accuracy in predicting held-out forms, and investigate other properties of interest learned by our models’ representations. We outline extensions to this architecture that can better capture variation in Indo-Aryan sound change.- Anthology ID:
- 2020.conll-1.50
- Volume:
- Proceedings of the 24th Conference on Computational Natural Language Learning
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- CoNLL
- SIG:
- SIGNLL
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 620–630
- Language:
- URL:
- https://aclanthology.org/2020.conll-1.50
- DOI:
- 10.18653/v1/2020.conll-1.50
- Cite (ACL):
- Chundra Cathcart and Taraka Rama. 2020. Disentangling dialects: a neural approach to Indo-Aryan historical phonology and subgrouping. In Proceedings of the 24th Conference on Computational Natural Language Learning, pages 620–630, Online. Association for Computational Linguistics.
- Cite (Informal):
- Disentangling dialects: a neural approach to Indo-Aryan historical phonology and subgrouping (Cathcart & Rama, CoNLL 2020)
- PDF:
- https://preview.aclanthology.org/starsem-semeval-split/2020.conll-1.50.pdf
- Code
- chundrac/ia-conll-2020