Abstract
The goal of whitespace correction is to fix space errors in arbitrary given text. For example, given the text “whi te space correctio nwithTransf or mers”, produce “whitespace correction with Transformers”. We compare two Transformer-based models, a character-level encoder-decoder model and a byte-level encoder-only model. We find that the encoder-only model is both faster and achieves higher quality. We provide an easy-to-use tool that is over 900 times faster than the previous best tool, with the same high quality. Our tool repairs text at a rate of over 200 kB/s on GPU, with a sequence-averaged F1-score ranging from 87.5% for hard-to-correct text up to 99% for text without any spaces.- Anthology ID:
- 2023.acl-demo.37
- Volume:
- Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Danushka Bollegala, Ruihong Huang, Alan Ritter
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 389–399
- Language:
- URL:
- https://aclanthology.org/2023.acl-demo.37
- DOI:
- 10.18653/v1/2023.acl-demo.37
- Cite (ACL):
- Hannah Bast, Matthias Hertel, and Sebastian Walter. 2023. Fast Whitespace Correction with Encoder-Only Transformers. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 389–399, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Fast Whitespace Correction with Encoder-Only Transformers (Bast et al., ACL 2023)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2023.acl-demo.37.pdf