Verba volant, scripta volant? Don’t worry! There are computational solutions for protoword reconstruction
Liviu P Dinu, Ana Sabina Uban, Alina Maria Cristea, Ioan-Bogdan Iordache, Teodor-George Marchitan, Simona Georgescu, Laurentiu Zoicas
Abstract
We introduce a new database of cognate words and etymons for the five main Romance languages, the most comprehensive one to date. We propose a strong benchmark for the automatic reconstruction of protowords for Romance languages, by applying a set of machine learning models and features on these data. The best results reach 90% accuracy in predicting the protoword of a given cognate set, surpassing existing state-of-the-art results for this task and showing that computational methods can be very useful in assisting linguists with protoword reconstruction.- Anthology ID:
- 2024.emnlp-main.362
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6314–6326
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2024.emnlp-main.362/
- DOI:
- 10.18653/v1/2024.emnlp-main.362
- Cite (ACL):
- Liviu P Dinu, Ana Sabina Uban, Alina Maria Cristea, Ioan-Bogdan Iordache, Teodor-George Marchitan, Simona Georgescu, and Laurentiu Zoicas. 2024. Verba volant, scripta volant? Don’t worry! There are computational solutions for protoword reconstruction. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 6314–6326, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Verba volant, scripta volant? Don’t worry! There are computational solutions for protoword reconstruction (Dinu et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2024.emnlp-main.362.pdf