Character alignment methods for dialect-to-standard normalization

Yves Scherrer


Abstract
This paper evaluates various character alignment methods on the task of sentence-level standardization of dialect transcriptions. We compare alignment methods from different scientific traditions (dialectometry, speech processing, machine translation) and apply them to Finnish, Norwegian and Swiss German dialect datasets. In the absence of gold alignments, we evaluate the methods on a set of characteristics that are deemed undesirable for the task. We find that trained alignment methods only show marginal benefits to simple Levenshtein distance. On this particular task, eflomal outperforms related methods such as GIZA++ or fast_align by a large margin.
Anthology ID:
2023.sigmorphon-1.12
Volume:
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Garrett Nicolai, Eleanor Chodroff, Frederic Mailhot, Çağrı Çöltekin
Venue:
SIGMORPHON
SIG:
SIGMORPHON
Publisher:
Association for Computational Linguistics
Note:
Pages:
110–116
Language:
URL:
https://aclanthology.org/2023.sigmorphon-1.12
DOI:
10.18653/v1/2023.sigmorphon-1.12
Bibkey:
Cite (ACL):
Yves Scherrer. 2023. Character alignment methods for dialect-to-standard normalization. In Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 110–116, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Character alignment methods for dialect-to-standard normalization (Scherrer, SIGMORPHON 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2023.sigmorphon-1.12.pdf
Video:
 https://preview.aclanthology.org/naacl24-info/2023.sigmorphon-1.12.mp4