Abstract
Transformer-based language models (TLMs), such as BERT, ALBERT and GPT-3, have shown strong performance in a wide range of NLP tasks and currently dominate the field of NLP. However, many researchers wonder whether these models can maintain their dominance forever. Of course, we do not have answers now, but, as an attempt to find better neural architectures and training schemes, we pretrain a simple CNN using a GAN-style learning scheme and Wikipedia data, and then integrate it with standard TLMs. We show that on the GLUE tasks, the combination of our pretrained CNN with ALBERT outperforms the original ALBERT and achieves a similar performance to that of SOTA. Furthermore, on open-domain QA (Quasar-T and SearchQA), the combination of the CNN with ALBERT or RoBERTa achieved stronger performance than SOTA and the original TLMs. We hope that this work provides a hint for developing a novel strong network architecture along with its training scheme. Our source code and models are available at https://github.com/nict-wisdom/bertac.- Anthology ID:
- 2021.acl-long.164
- Volume:
- Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Editors:
- Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
- Venues:
- ACL | IJCNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2103–2115
- Language:
- URL:
- https://aclanthology.org/2021.acl-long.164
- DOI:
- 10.18653/v1/2021.acl-long.164
- Cite (ACL):
- Jong-Hoon Oh, Ryu Iida, Julien Kloetzer, and Kentaro Torisawa. 2021. BERTAC: Enhancing Transformer-based Language Models with Adversarially Pretrained Convolutional Neural Networks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2103–2115, Online. Association for Computational Linguistics.
- Cite (Informal):
- BERTAC: Enhancing Transformer-based Language Models with Adversarially Pretrained Convolutional Neural Networks (Oh et al., ACL-IJCNLP 2021)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/2021.acl-long.164.pdf
- Code
- nict-wisdom/bertac
- Data
- GLUE, QNLI, QUASAR, QUASAR-T, SearchQA