Automatic Inference of Sound Correspondence Patterns across Multiple Languages

Johann-Mattis List


Abstract
Sound correspondence patterns play a crucial role for linguistic reconstruction. Linguists use them to prove language relationship, to reconstruct proto-forms, and for classical phylogenetic reconstruction based on shared innovations. Cognate words that fail to conform with expected patterns can further point to various kinds of exceptions in sound change, such as analogy or assimilation of frequent words. Here I present an automatic method for the inference of sound correspondence patterns across multiple languages based on a network approach. The core idea is to represent all columns in aligned cognate sets as nodes in a network with edges representing the degree of compatibility between the nodes. The task of inferring all compatible correspondence sets can then be handled as the well-known minimum clique cover problem in graph theory, which essentially seeks to split the graph into the smallest number of cliques in which each node is represented by exactly one clique. The resulting partitions represent all correspondence patterns that can be inferred for a given data set. By excluding those patterns that occur in only a few cognate sets, the core of regularly recurring sound correspondences can be inferred. Based on this idea, the article presents a method for automatic correspondence pattern recognition, which is implemented as part of a Python library which supplements the article. To illustrate the usefulness of the method, I present how the inferred patterns can be used to predict words that have not been observed before.
Anthology ID:
J19-1004
Volume:
Computational Linguistics, Volume 45, Issue 1 - March 2019
Month:
March
Year:
2019
Address:
Cambridge, MA
Venue:
CL
SIG:
Publisher:
MIT Press
Note:
Pages:
137–161
Language:
URL:
https://aclanthology.org/J19-1004
DOI:
10.1162/coli_a_00344
Bibkey:
Cite (ACL):
Johann-Mattis List. 2019. Automatic Inference of Sound Correspondence Patterns across Multiple Languages. Computational Linguistics, 45(1):137–161.
Cite (Informal):
Automatic Inference of Sound Correspondence Patterns across Multiple Languages (List, CL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/J19-1004.pdf
Code
 lingpy/lingrex +  additional community code