Abstract
Recently, pre-trained language representation models such as BERT and RoBERTa have achieved significant results in a wide range of natural language processing (NLP) tasks, however, it requires extremely high computational cost. Curriculum Learning (CL) is one of the potential solutions to alleviate this problem. CL is a training strategy where training samples are given to models in a meaningful order instead of random sampling. In this work, we propose a new CL method which gradually increases the block-size of input text for training the self-attention mechanism of BERT and its variants using the maximum available batch-size. Experiments in low-resource settings show that our approach outperforms the baseline in terms of convergence speed and final performance on downstream tasks.- Anthology ID:
- 2021.ranlp-1.112
- Volume:
- Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
- Month:
- September
- Year:
- 2021
- Address:
- Held Online
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 989–996
- Language:
- URL:
- https://aclanthology.org/2021.ranlp-1.112
- DOI:
- Cite (ACL):
- Koichi Nagatsuka, Clifford Broni-Bediako, and Masayasu Atsumi. 2021. Pre-training a BERT with Curriculum Learning by Increasing Block-Size of Input Text. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 989–996, Held Online. INCOMA Ltd..
- Cite (Informal):
- Pre-training a BERT with Curriculum Learning by Increasing Block-Size of Input Text (Nagatsuka et al., RANLP 2021)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2021.ranlp-1.112.pdf
- Data
- GLUE, QNLI, WikiText-2