Abstract
We propose a Chinese spell checker – FASPell based on a new paradigm which consists of a denoising autoencoder (DAE) and a decoder. In comparison with previous state-of-the-art models, the new paradigm allows our spell checker to be Faster in computation, readily Adaptable to both simplified and traditional Chinese texts produced by either humans or machines, and to require much Simpler structure to be as much Powerful in both error detection and correction. These four achievements are made possible because the new paradigm circumvents two bottlenecks. First, the DAE curtails the amount of Chinese spell checking data needed for supervised learning (to <10k sentences) by leveraging the power of unsupervisedly pre-trained masked language model as in BERT, XLNet, MASS etc. Second, the decoder helps to eliminate the use of confusion set that is deficient in flexibility and sufficiency of utilizing the salient feature of Chinese character similarity.- Anthology ID:
- D19-5522
- Volume:
- Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong, China
- Editors:
- Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
- Venue:
- WNUT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 160–169
- Language:
- URL:
- https://aclanthology.org/D19-5522
- DOI:
- 10.18653/v1/D19-5522
- Cite (ACL):
- Yuzhong Hong, Xianguo Yu, Neng He, Nan Liu, and Junhui Liu. 2019. FASPell: A Fast, Adaptable, Simple, Powerful Chinese Spell Checker Based On DAE-Decoder Paradigm. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pages 160–169, Hong Kong, China. Association for Computational Linguistics.
- Cite (Informal):
- FASPell: A Fast, Adaptable, Simple, Powerful Chinese Spell Checker Based On DAE-Decoder Paradigm (Hong et al., WNUT 2019)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/D19-5522.pdf
- Code
- iqiyi/FASPell