Gated Transformer for Robust De-noised Sequence-to-Sequence Modelling
Ayan Sengupta, Amit Kumar, Sourabh Kumar Bhattacharjee, Suman Roy
Abstract
Robust sequence-to-sequence modelling is an essential task in the real world where the inputs are often noisy. Both user-generated and machine generated inputs contain various kinds of noises in the form of spelling mistakes, grammatical errors, character recognition errors, all of which impact downstream tasks and affect interpretability of texts. In this work, we devise a novel sequence-to-sequence architecture for detecting and correcting different real world and artificial noises (adversarial attacks) from English texts. Towards that we propose a modified Transformer-based encoder-decoder architecture that uses a gating mechanism to detect types of corrections required and accordingly corrects texts. Experimental results show that our gated architecture with pre-trained language models perform significantly better that the non-gated counterparts and other state-of-the-art error correction models in correcting spelling and grammatical errors. Extrinsic evaluation of our model on Machine Translation (MT) and Summarization tasks show the competitive performance of the model against other generative sequence-to-sequence models under noisy inputs.- Anthology ID:
- 2021.findings-emnlp.309
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2021
- Month:
- November
- Year:
- 2021
- Address:
- Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- Findings
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3645–3657
- Language:
- URL:
- https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2021.findings-emnlp.309/
- DOI:
- 10.18653/v1/2021.findings-emnlp.309
- Cite (ACL):
- Ayan Sengupta, Amit Kumar, Sourabh Kumar Bhattacharjee, and Suman Roy. 2021. Gated Transformer for Robust De-noised Sequence-to-Sequence Modelling. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3645–3657, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Gated Transformer for Robust De-noised Sequence-to-Sequence Modelling (Sengupta et al., Findings 2021)
- PDF:
- https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2021.findings-emnlp.309.pdf