Style Transfer as Data Augmentation: A Case Study on Named Entity Recognition

Shuguang Chen, Leonardo Neves, Thamar Solorio


Abstract
In this work, we take the named entity recognition task in the English language as a case study and explore style transfer as a data augmentation method to increase the size and diversity of training data in low-resource scenarios. We propose a new method to effectively transform the text from a high-resource domain to a low-resource domain by changing its style-related attributes to generate synthetic data for training. Moreover, we design a constrained decoding algorithm along with a set of key ingredients for data selection to guarantee the generation of valid and coherent data. Experiments and analysis on five different domain pairs under different data regimes demonstrate that our approach can significantly improve results compared to current state-of-the-art data augmentation methods. Our approach is a practical solution to data scarcity, and we expect it to be applicable to other NLP tasks.
Anthology ID:
2022.emnlp-main.120
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1827–1841
Language:
URL:
https://aclanthology.org/2022.emnlp-main.120
DOI:
Bibkey:
Cite (ACL):
Shuguang Chen, Leonardo Neves, and Thamar Solorio. 2022. Style Transfer as Data Augmentation: A Case Study on Named Entity Recognition. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1827–1841, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Style Transfer as Data Augmentation: A Case Study on Named Entity Recognition (Chen et al., EMNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-ingestion/2022.emnlp-main.120.pdf