Abstract
Chinese Spelling Check (CSC) is to detect and correct Chinese spelling errors. Many models utilize a predefined confusion set to learn a mapping between correct characters and its visually similar or phonetically similar misuses but the mapping may be out-of-domain. To that end, we propose SpellBERT, a pretrained model with graph-based extra features and independent on confusion set. To explicitly capture the two erroneous patterns, we employ a graph neural network to introduce radical and pinyin information as visual and phonetic features. For better fusing these features with character representations, we devise masked language model alike pre-training tasks. With this feature-rich pre-training, SpellBERT with only half size of BERT can show competitive performance and make a state-of-the-art result on the OCR dataset where most of the errors are not covered by the existing confusion set.- Anthology ID:
- 2021.emnlp-main.287
- Original:
- 2021.emnlp-main.287v1
- Version 2:
- 2021.emnlp-main.287v2
- Volume:
- Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3544–3551
- Language:
- URL:
- https://aclanthology.org/2021.emnlp-main.287
- DOI:
- 10.18653/v1/2021.emnlp-main.287
- Cite (ACL):
- Tuo Ji, Hang Yan, and Xipeng Qiu. 2021. SpellBERT: A Lightweight Pretrained Model for Chinese Spelling Check. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3544–3551, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- SpellBERT: A Lightweight Pretrained Model for Chinese Spelling Check (Ji et al., EMNLP 2021)
- PDF:
- https://preview.aclanthology.org/landing_page/2021.emnlp-main.287.pdf
- Code
- benbijituo/spellbert