@inproceedings{maharana-bansal-2022-grada-graph,
    title = "{G}ra{DA}: Graph Generative Data Augmentation for Commonsense Reasoning",
    author = "Maharana, Adyasha  and
      Bansal, Mohit",
    editor = "Calzolari, Nicoletta  and
      Huang, Chu-Ren  and
      Kim, Hansaem  and
      Pustejovsky, James  and
      Wanner, Leo  and
      Choi, Key-Sun  and
      Ryu, Pum-Mo  and
      Chen, Hsin-Hsi  and
      Donatelli, Lucia  and
      Ji, Heng  and
      Kurohashi, Sadao  and
      Paggio, Patrizia  and
      Xue, Nianwen  and
      Kim, Seokhwan  and
      Hahm, Younggyun  and
      He, Zhong  and
      Lee, Tony Kyungil  and
      Santus, Enrico  and
      Bond, Francis  and
      Na, Seung-Hoon",
    booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
    month = oct,
    year = "2022",
    address = "Gyeongju, Republic of Korea",
    publisher = "International Committee on Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2022.coling-1.397/",
    pages = "4499--4516",
    abstract = "Recent advances in commonsense reasoning have been fueled by the availability of large-scale human annotated datasets. Manual annotation of such datasets, many of which are based on existing knowledge bases, is expensive and not scalable. Moreover, it is challenging to build augmentation data for commonsense reasoning because the synthetic questions need to adhere to real-world scenarios. Hence, we present GraDA, a graph-generative data augmentation framework to synthesize factual data samples from knowledge graphs for commonsense reasoning datasets. First, we train a graph-to-text model for conditional generation of questions from graph entities and relations. Then, we train a generator with GAN loss to generate distractors for synthetic questions. Our approach improves performance for SocialIQA, CODAH, HellaSwag and CommonsenseQA, and works well for generative tasks like ProtoQA. We show improvement in robustness to semantic adversaries after training with GraDA and provide human evaluation of the quality of synthetic datasets in terms of factuality and answerability. Our work provides evidence and encourages future research into graph-based generative data augmentation."
}