Pre-training a BERT with Curriculum Learning by Increasing Block-Size of Input Text

Koichi Nagatsuka; Clifford Broni-Bediako; Masayasu Atsumi

Pre-training a BERT with Curriculum Learning by Increasing Block-Size of Input Text

Koichi Nagatsuka, Clifford Broni-Bediako, Masayasu Atsumi

Abstract

Recently, pre-trained language representation models such as BERT and RoBERTa have achieved significant results in a wide range of natural language processing (NLP) tasks, however, it requires extremely high computational cost. Curriculum Learning (CL) is one of the potential solutions to alleviate this problem. CL is a training strategy where training samples are given to models in a meaningful order instead of random sampling. In this work, we propose a new CL method which gradually increases the block-size of input text for training the self-attention mechanism of BERT and its variants using the maximum available batch-size. Experiments in low-resource settings show that our approach outperforms the baseline in terms of convergence speed and final performance on downstream tasks.

Anthology ID:: 2021.ranlp-1.112
Volume:: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:: September
Year:: 2021
Address:: Held Online
Venue:: RANLP
SIG:
Publisher:: INCOMA Ltd.
Note:
Pages:: 989–996
Language:
URL:: https://aclanthology.org/2021.ranlp-1.112
DOI:
Bibkey:
Cite (ACL):: Koichi Nagatsuka, Clifford Broni-Bediako, and Masayasu Atsumi. 2021. Pre-training a BERT with Curriculum Learning by Increasing Block-Size of Input Text. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 989–996, Held Online. INCOMA Ltd..
Cite (Informal):: Pre-training a BERT with Curriculum Learning by Increasing Block-Size of Input Text (Nagatsuka et al., RANLP 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/paclic-22-ingestion/2021.ranlp-1.112.pdf
Data: GLUE, QNLI, WikiText-2

PDF Search