Speeding Up Transformer Training By Using Dataset Subsampling - An Exploratory Analysis
Lovre Torbarina, Velimir Mihelčić, Bruno Šarlija, Lukasz Roguski, Željko Kraljević
Abstract
Transformer-based models have greatly advanced the progress in the field of the natural language processing and while they achieve state-of-the-art results on a wide range of tasks, they are cumbersome in parameter size. Subsequently, even when pre-trained transformer models are used for fine-tuning on a given task, if the dataset is large, it may still not be feasible to fine-tune the model within a reasonable time. For this reason, we empirically test 8 subsampling methods for reducing the dataset size on text classification task and report the trade-off between metric score and training time. 7 out of 8 methods are simple methods, while the last one is CRAIG, a method for coreset construction for data-efficient model training. We obtain the best result with the CRAIG method, offering an average decrease of 0.03 points in f-score on test set while speeding up the training time on average by 63.93%, relative to the score and time obtained by using the full dataset. Lastly, we show the trade-off between speed and performance for all sampling methods on three different datasets.- Anthology ID:
- 2021.sustainlp-1.11
- Volume:
- Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing
- Month:
- November
- Year:
- 2021
- Address:
- Virtual
- Editors:
- Nafise Sadat Moosavi, Iryna Gurevych, Angela Fan, Thomas Wolf, Yufang Hou, Ana Marasović, Sujith Ravi
- Venue:
- sustainlp
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 86–95
- Language:
- URL:
- https://aclanthology.org/2021.sustainlp-1.11
- DOI:
- 10.18653/v1/2021.sustainlp-1.11
- Cite (ACL):
- Lovre Torbarina, Velimir Mihelčić, Bruno Šarlija, Lukasz Roguski, and Željko Kraljević. 2021. Speeding Up Transformer Training By Using Dataset Subsampling - An Exploratory Analysis. In Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing, pages 86–95, Virtual. Association for Computational Linguistics.
- Cite (Informal):
- Speeding Up Transformer Training By Using Dataset Subsampling - An Exploratory Analysis (Torbarina et al., sustainlp 2021)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2021.sustainlp-1.11.pdf