Computerized Forward Reconstruction for Analysis in Diachronic Phonology, and Latin to French Reflex Prediction

Clayton Marr, David R. Mortensen


Abstract
Traditionally, historical phonologists have relied on tedious manual derivations to calibrate the sequences of sound changes that shaped the phonological evolution of languages. However, humans are prone to errors, and cannot track thousands of parallel word derivations in any efficient manner. We propose to instead automatically derive each lexical item in parallel, and we demonstrate forward reconstruction as both a computational task with metrics to optimize, and as an empirical tool for inquiry. For this end we present DiaSim, a user-facing application that simulates “cascades” of diachronic developments over a language’s lexicon and provides diagnostics for “debugging” those cascades. We test our methodology on a Latin-to-French reflex prediction task, using a newly compiled dataset FLLex with 1368 paired Latin/French forms. We also present, FLLAPS, which maps 310 Latin reflexes through five stages until Modern French, derived from Pope (1934)’s sound tables. Our publicly available rule cascades include the baselines BaseCLEF and BaseCLEF*, representing the received view of Latin to French development, and DiaCLEF, build by incremental corrections to BaseCLEF aided by DiaSim’s diagnostics. DiaCLEF vastly outperforms the baselines, improving final accuracy on FLLex from 3.2%to 84.9%, and similar improvements across FLLAPS’ stages.
Anthology ID:
2020.lt4hala-1.5
Volume:
Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LT4HALA
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
28–36
Language:
English
URL:
https://aclanthology.org/2020.lt4hala-1.5
DOI:
Bibkey:
Cite (ACL):
Clayton Marr and David R. Mortensen. 2020. Computerized Forward Reconstruction for Analysis in Diachronic Phonology, and Latin to French Reflex Prediction. In Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages, pages 28–36, Marseille, France. European Language Resources Association (ELRA).
Cite (Informal):
Computerized Forward Reconstruction for Analysis in Diachronic Phonology, and Latin to French Reflex Prediction (Marr & Mortensen, LT4HALA 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-url/2020.lt4hala-1.5.pdf