Abstract
This paper evaluates various character alignment methods on the task of sentence-level standardization of dialect transcriptions. We compare alignment methods from different scientific traditions (dialectometry, speech processing, machine translation) and apply them to Finnish, Norwegian and Swiss German dialect datasets. In the absence of gold alignments, we evaluate the methods on a set of characteristics that are deemed undesirable for the task. We find that trained alignment methods only show marginal benefits to simple Levenshtein distance. On this particular task, eflomal outperforms related methods such as GIZA++ or fast_align by a large margin.- Anthology ID:
- 2023.sigmorphon-1.12
- Volume:
- Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Garrett Nicolai, Eleanor Chodroff, Frederic Mailhot, Çağrı Çöltekin
- Venue:
- SIGMORPHON
- SIG:
- SIGMORPHON
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 110–116
- Language:
- URL:
- https://aclanthology.org/2023.sigmorphon-1.12
- DOI:
- 10.18653/v1/2023.sigmorphon-1.12
- Cite (ACL):
- Yves Scherrer. 2023. Character alignment methods for dialect-to-standard normalization. In Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 110–116, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Character alignment methods for dialect-to-standard normalization (Scherrer, SIGMORPHON 2023)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2023.sigmorphon-1.12.pdf