Phonetic Vector Representations for Sound Sequence Alignment

Pavel Sofroniev, Çağrı Çöltekin


Abstract
This study explores a number of data-driven vector representations of the IPA-encoded sound segments for the purpose of sound sequence alignment. We test the alternative representations based on the alignment accuracy in the context of computational historical linguistics. We show that the data-driven methods consistently do better than linguistically-motivated articulatory-acoustic features. The similarity scores obtained using the data-driven representations in a monolingual context, however, performs worse than the state-of-the-art distance (or similarity) scoring methods proposed in earlier studies of computational historical linguistics. We also show that adapting representations to the task at hand improves the results, yielding alignment accuracy comparable to the state of the art methods.
Anthology ID:
W18-5812
Volume:
Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology
Month:
October
Year:
2018
Address:
Brussels, Belgium
Venue:
EMNLP
SIG:
SIGMORPHON
Publisher:
Association for Computational Linguistics
Note:
Pages:
111–116
Language:
URL:
https://aclanthology.org/W18-5812
DOI:
10.18653/v1/W18-5812
Bibkey:
Cite (ACL):
Pavel Sofroniev and Çağrı Çöltekin. 2018. Phonetic Vector Representations for Sound Sequence Alignment. In Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 111–116, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Phonetic Vector Representations for Sound Sequence Alignment (Sofroniev & Çöltekin, EMNLP 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/W18-5812.pdf