Latin script keyboards for South Asian languages with finite-state normalization
Lawrence Wolf-Sonkin, Vlad Schogol, Brian Roark, Michael Riley
Abstract
The use of the Latin script for text entry of South Asian languages is common, even though there is no standard orthography for these languages in the script. We explore several compact finite-state architectures that permit variable spellings of words during mobile text entry. We find that approaches making use of transliteration transducers provide large accuracy improvements over baselines, but that simpler approaches involving a compact representation of many attested alternatives yields much of the accuracy gain. This is particularly important when operating under constraints on model size (e.g., on inexpensive mobile devices with limited storage and memory for keyboard models), and on speed of inference, since people typing on mobile keyboards expect no perceptual delay in keyboard responsiveness.- Anthology ID:
- W19-3114
- Volume:
- Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing
- Month:
- September
- Year:
- 2019
- Address:
- Dresden, Germany
- Venue:
- FSMNLP
- SIG:
- SIGFSM
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 108–117
- Language:
- URL:
- https://aclanthology.org/W19-3114
- DOI:
- 10.18653/v1/W19-3114
- Cite (ACL):
- Lawrence Wolf-Sonkin, Vlad Schogol, Brian Roark, and Michael Riley. 2019. Latin script keyboards for South Asian languages with finite-state normalization. In Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing, pages 108–117, Dresden, Germany. Association for Computational Linguistics.
- Cite (Informal):
- Latin script keyboards for South Asian languages with finite-state normalization (Wolf-Sonkin et al., FSMNLP 2019)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/W19-3114.pdf