Disentangling dialects: a neural approach to Indo-Aryan historical phonology and subgrouping

Chundra Cathcart; Taraka Rama

doi:10.18653/v1/2020.conll-1.50

Disentangling dialects: a neural approach to Indo-Aryan historical phonology and subgrouping

Abstract

This paper seeks to uncover patterns of sound change across Indo-Aryan languages using an LSTM encoder-decoder architecture. We augment our models with embeddings represent-ing language ID, part of speech, and other features such as word embeddings. We find that a highly augmented model shows highest accuracy in predicting held-out forms, and investigate other properties of interest learned by our models’ representations. We outline extensions to this architecture that can better capture variation in Indo-Aryan sound change.

Anthology ID:: 2020.conll-1.50
Volume:: Proceedings of the 24th Conference on Computational Natural Language Learning
Month:: November
Year:: 2020
Address:: Online
Venue:: CoNLL
SIG:: SIGNLL
Publisher:: Association for Computational Linguistics
Note:
Pages:: 620–630
Language:
URL:: https://aclanthology.org/2020.conll-1.50
DOI:: 10.18653/v1/2020.conll-1.50
Bibkey:
Cite (ACL):: Chundra Cathcart and Taraka Rama. 2020. Disentangling dialects: a neural approach to Indo-Aryan historical phonology and subgrouping. In Proceedings of the 24th Conference on Computational Natural Language Learning, pages 620–630, Online. Association for Computational Linguistics.
Cite (Informal):: Disentangling dialects: a neural approach to Indo-Aryan historical phonology and subgrouping (Cathcart & Rama, CoNLL 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/starsem-semeval-split/2020.conll-1.50.pdf
Code: chundrac/ia-conll-2020

PDF Search Code