@inproceedings{chen-liu-2022-rethinking,
    title = "Rethinking Data Augmentation in Text-to-text Paradigm",
    author = "Chen, Yanan  and
      Liu, Yang",
    editor = "Calzolari, Nicoletta  and
      Huang, Chu-Ren  and
      Kim, Hansaem  and
      Pustejovsky, James  and
      Wanner, Leo  and
      Choi, Key-Sun  and
      Ryu, Pum-Mo  and
      Chen, Hsin-Hsi  and
      Donatelli, Lucia  and
      Ji, Heng  and
      Kurohashi, Sadao  and
      Paggio, Patrizia  and
      Xue, Nianwen  and
      Kim, Seokhwan  and
      Hahm, Younggyun  and
      He, Zhong  and
      Lee, Tony Kyungil  and
      Santus, Enrico  and
      Bond, Francis  and
      Na, Seung-Hoon",
    booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
    month = oct,
    year = "2022",
    address = "Gyeongju, Republic of Korea",
    publisher = "International Committee on Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2022.coling-1.99/",
    pages = "1157--1162",
    abstract = "As manually labelling data can be costly, some recent studies tend to augment the training data for improving the generalization power of machine learning models, known as \textit{data augmentation} (DA). With the arise of pre-trained language models (PLMs), some recent works on DA try to synthesize new samples benefiting from the knowledge learned from PLM{'}s pre-training. Along the same direction, we in this paper propose to integrate text-to-text language models and construct a new two-phase framework for augmentation: 1) a fine-tuning phase where PLMs are well adapted to downstream classification with the help of two novel schemes, and 2) a generation phase where the fine-tuned models are leveraged to create new samples for performance lifting. This paradigm opens up a new way of designing fine-tuning scheme to better serve DA in an easy-to-implement manner, and can be easily extended to other desired tasks. We evaluate our proposal on two public classification datasets and demonstrate its effectiveness with remarkable gains."
}Markdown (Informal)
[Rethinking Data Augmentation in Text-to-text Paradigm](https://preview.aclanthology.org/ingest-emnlp/2022.coling-1.99/) (Chen & Liu, COLING 2022)
ACL
- Yanan Chen and Yang Liu. 2022. Rethinking Data Augmentation in Text-to-text Paradigm. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1157–1162, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.