Pre-training a BERT with Curriculum Learning by Increasing Block-Size of Input Text

Koichi Nagatsuka, Clifford Broni-Bediako, Masayasu Atsumi


Abstract
Recently, pre-trained language representation models such as BERT and RoBERTa have achieved significant results in a wide range of natural language processing (NLP) tasks, however, it requires extremely high computational cost. Curriculum Learning (CL) is one of the potential solutions to alleviate this problem. CL is a training strategy where training samples are given to models in a meaningful order instead of random sampling. In this work, we propose a new CL method which gradually increases the block-size of input text for training the self-attention mechanism of BERT and its variants using the maximum available batch-size. Experiments in low-resource settings show that our approach outperforms the baseline in terms of convergence speed and final performance on downstream tasks.
Anthology ID:
2021.ranlp-1.112
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
989–996
Language:
URL:
https://aclanthology.org/2021.ranlp-1.112
DOI:
Bibkey:
Cite (ACL):
Koichi Nagatsuka, Clifford Broni-Bediako, and Masayasu Atsumi. 2021. Pre-training a BERT with Curriculum Learning by Increasing Block-Size of Input Text. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 989–996, Held Online. INCOMA Ltd..
Cite (Informal):
Pre-training a BERT with Curriculum Learning by Increasing Block-Size of Input Text (Nagatsuka et al., RANLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.ranlp-1.112.pdf
Data
GLUEQNLIWikiText-2