Steven Y. Feng


2021

pdf bib
A Survey of Data Augmentation Approaches for NLP
Steven Y. Feng | Varun Gangal | Jason Wei | Sarath Chandar | Soroush Vosoughi | Teruko Mitamura | Eduard Hovy
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
SAPPHIRE: Approaches for Enhanced Concept-to-Text Generation
Steven Y. Feng | Jessica Huynh | Chaitanya Prasad Narisetty | Eduard Hovy | Varun Gangal
Proceedings of the 14th International Conference on Natural Language Generation

We motivate and propose a suite of simple but effective improvements for concept-to-text generation called SAPPHIRE: Set Augmentation and Post-hoc PHrase Infilling and REcombination. We demonstrate their effectiveness on generative commonsense reasoning, a.k.a. the CommonGen task, through experiments using both BART and T5 models. Through extensive automatic and human evaluation, we show that SAPPHIRE noticeably improves model performance. An in-depth qualitative analysis illustrates that SAPPHIRE effectively addresses many issues of the baseline model generations, including lack of commonsense, insufficient specificity, and poor fluency.

2020

pdf bib
GenAug: Data Augmentation for Finetuning Text Generators
Steven Y. Feng | Varun Gangal | Dongyeop Kang | Teruko Mitamura | Eduard Hovy
Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

In this paper, we investigate data augmentation for text generation, which we call GenAug. Text generation and language modeling are important tasks within natural language processing, and are especially challenging for low-data regimes. We propose and evaluate various augmentation methods, including some that incorporate external knowledge, for finetuning GPT-2 on a subset of Yelp Reviews. We also examine the relationship between the amount of augmentation and the quality of the generated text. We utilize several metrics that evaluate important aspects of the generated text including its diversity and fluency. Our experiments demonstrate that insertion of character-level synthetic noise and keyword replacement with hypernyms are effective augmentation methods, and that the quality of generations improves to a peak at approximately three times the amount of original data.

2019

pdf bib
Keep Calm and Switch On! Preserving Sentiment and Fluency in Semantic Text Exchange
Steven Y. Feng | Aaron W. Li | Jesse Hoey
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In this paper, we present a novel method for measurably adjusting the semantics of text while preserving its sentiment and fluency, a task we call semantic text exchange. This is useful for text data augmentation and the semantic correction of text generated by chatbots and virtual assistants. We introduce a pipeline called SMERTI that combines entity replacement, similarity masking, and text infilling. We measure our pipeline’s success by its Semantic Text Exchange Score (STES): the ability to preserve the original text’s sentiment and fluency while adjusting semantic content. We propose to use masking (replacement) rate threshold as an adjustable parameter to control the amount of semantic change in the text. Our experiments demonstrate that SMERTI can outperform baseline models on Yelp reviews, Amazon reviews, and news headlines.