@inproceedings{lyman-hepner-2024-whatif,
    title = "{W}hat{I}f: Leveraging Word Vectors for Small-Scale Data Augmentation",
    author = "Lyman, Alex  and
      Hepner, Bryce",
    editor = "Hu, Michael Y.  and
      Mueller, Aaron  and
      Ross, Candace  and
      Williams, Adina  and
      Linzen, Tal  and
      Zhuang, Chengxu  and
      Choshen, Leshem  and
      Cotterell, Ryan  and
      Warstadt, Alex  and
      Wilcox, Ethan Gotlieb",
    booktitle = "The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning",
    month = nov,
    year = "2024",
    address = "Miami, FL, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2024.conll-babylm.20/",
    pages = "229--236",
    abstract = "We introduce WhatIf, a lightly supervised data augmentation technique that leverages word vectors to enhance training data for small-scale language models. Inspired by reading prediction strategies used in education, WhatIf creates new samples by substituting semantically similar words in the training data. We evaluate WhatIf on multiple datasets, demonstrating small but consistent improvements in downstream evaluation compared to baseline models. Finally, we compare WhatIf to other small-scale data augmentation techniques and find that it provides comparable quantitative results at a potential tradeoff to qualitative evaluation."
}Markdown (Informal)
[WhatIf: Leveraging Word Vectors for Small-Scale Data Augmentation](https://preview.aclanthology.org/ingest-emnlp/2024.conll-babylm.20/) (Lyman & Hepner, CoNLL-BabyLM 2024)
ACL