Augmenting NLP models using Latent Feature Interpolations

Amit Jindal, Arijit Ghosh Chowdhury, Aniket Didolkar, Di Jin, Ramit Sawhney, Rajiv Ratn Shah


Abstract
Models with a large number of parameters are prone to over-fitting and often fail to capture the underlying input distribution. We introduce Emix, a data augmentation method that uses interpolations of word embeddings and hidden layer representations to construct virtual examples. We show that Emix shows significant improvements over previously used interpolation based regularizers and data augmentation techniques. We also demonstrate how our proposed method is more robust to sparsification. We highlight the merits of our proposed methodology by performing thorough quantitative and qualitative assessments.
Anthology ID:
2020.coling-main.611
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
6931–6936
Language:
URL:
https://aclanthology.org/2020.coling-main.611
DOI:
10.18653/v1/2020.coling-main.611
Bibkey:
Cite (ACL):
Amit Jindal, Arijit Ghosh Chowdhury, Aniket Didolkar, Di Jin, Ramit Sawhney, and Rajiv Ratn Shah. 2020. Augmenting NLP models using Latent Feature Interpolations. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6931–6936, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Augmenting NLP models using Latent Feature Interpolations (Jindal et al., COLING 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2020.coling-main.611.pdf
Data
SSTSST-2