2019
pdf
abs
Neural Text Style Transfer via Denoising and Reranking
Joseph Lee
|
Ziang Xie
|
Cindy Wang
|
Max Drach
|
Dan Jurafsky
|
Andrew Ng
Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation
We introduce a simple method for text style transfer that frames style transfer as denoising: we synthesize a noisy corpus and treat the source style as a noisy version of the target style. To control for aspects such as preserving meaning while modifying style, we propose a reranking approach in the data synthesis phase. We evaluate our method on three novel style transfer tasks: transferring between British and American varieties, text genres (formal vs. casual), and lyrics from different musical genres. By measuring style transfer quality, meaning preservation, and the fluency of generated outputs, we demonstrate that our method is able both to produce high-quality output while maintaining the flexibility to suggest syntactically rich stylistic edits.
2018
pdf
abs
Noising and Denoising Natural Language: Diverse Backtranslation for Grammar Correction
Ziang Xie
|
Guillaume Genthial
|
Stanley Xie
|
Andrew Ng
|
Dan Jurafsky
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Translation-based methods for grammar correction that directly map noisy, ungrammatical text to their clean counterparts are able to correct a broad range of errors; however, such techniques are bottlenecked by the need for a large parallel corpus of noisy and clean sentence pairs. In this paper, we consider synthesizing parallel data by noising a clean monolingual corpus. While most previous approaches introduce perturbations using features computed from local context windows, we instead develop error generation processes using a neural sequence transduction model trained to translate clean examples to their noisy counterparts. Given a corpus of clean examples, we propose beam search noising procedures to synthesize additional noisy examples that human evaluators were nearly unable to discriminate from nonsynthesized examples. Surprisingly, when trained on additional data synthesized using our best-performing noising scheme, our model approaches the same performance as when trained on additional nonsynthesized data.
2015
pdf
Lexicon-Free Conversational Speech Recognition with Neural Networks
Andrew Maas
|
Ziang Xie
|
Dan Jurafsky
|
Andrew Ng
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies