PEER: Pre-training ELECTRA Extended by Ranking

Ru He, Wei Wang, Songfang Huang, Fei Huang


Abstract
The BERT model and its variants have made great achievements in many downstream natural language processing tasks. The achievements of these models, however, demand highly expensive pre-training computation cost. To address this pre-training efficiency issue, the ELECTRA model is proposed to use a discriminator to perform replaced token detection (RTD) task, that is, to classify whether each input token is original or replaced by a generator. The RTD task performed by the ELECTRA accelerates pre-training so substantially, such that it is very challenging to further improve the pre-training efficiency established by the ELECTRA by using or adding other pre-training tasks, as the recent comprehensive study of Bajaj et al. (2022) summarizes. To further advance this pre-training efficiency frontier, in this paper we propose to extend the RTD task into a task of ranking input tokens according to K different quality levels. Essentially, we generalize the binary classifier in the ELECTRA into a K-level ranker to undertake a more precise task with negligible additional computation cost. Our extensive experiments show that our proposed method is able to outperform the state-of-the-art pre-training efficient models including ELECTRA in downstream GLUE tasks given the same computation cost.
Anthology ID:
2023.findings-acl.405
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6475–6491
Language:
URL:
https://preview.aclanthology.org/icon-24-ingestion/2023.findings-acl.405/
DOI:
10.18653/v1/2023.findings-acl.405
Bibkey:
Cite (ACL):
Ru He, Wei Wang, Songfang Huang, and Fei Huang. 2023. PEER: Pre-training ELECTRA Extended by Ranking. In Findings of the Association for Computational Linguistics: ACL 2023, pages 6475–6491, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
PEER: Pre-training ELECTRA Extended by Ranking (He et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/icon-24-ingestion/2023.findings-acl.405.pdf
Video:
 https://preview.aclanthology.org/icon-24-ingestion/2023.findings-acl.405.mp4