Iterative Paraphrastic Augmentation with Discriminative Span Alignment

Ryan Culkin; J. Edward Hu; Elias Stengel-Eskin; Guanghui Qin; Benjamin Van Durme

doi:10.1162/tacl_a_00380

Iterative Paraphrastic Augmentation with Discriminative Span Alignment

Ryan Culkin, J. Edward Hu, Elias Stengel-Eskin, Guanghui Qin, Benjamin Van Durme

Abstract

We introduce a novel paraphrastic augmentation strategy based on sentence-level lexically constrained paraphrasing and discriminative span alignment. Our approach allows for the large-scale expansion of existing datasets or the rapid creation of new datasets using a small, manually produced seed corpus. We demonstrate our approach with experiments on the Berkeley FrameNet Project, a large-scale language understanding effort spanning more than two decades of human labor. With four days of training data collection for a span alignment model and one day of parallel compute, we automatically generate and release to the community 495,300 unique (Frame,Trigger) pairs in diverse sentential contexts, a roughly 50-fold expansion atop FrameNet v1.7. The resulting dataset is intrinsically and extrinsically evaluated in detail, showing positive results on a downstream task.

Anthology ID:: 2021.tacl-1.30
Volume:: Transactions of the Association for Computational Linguistics, Volume 9
Month:
Year:: 2021
Address:: Cambridge, MA
Editors:: Brian Roark, Ani Nenkova
Venue:: TACL
SIG:
Publisher:: MIT Press
Note:
Pages:: 494–509
Language:
URL:: https://aclanthology.org/2021.tacl-1.30
DOI:: 10.1162/tacl_a_00380
Bibkey:
Cite (ACL):: Ryan Culkin, J. Edward Hu, Elias Stengel-Eskin, Guanghui Qin, and Benjamin Van Durme. 2021. Iterative Paraphrastic Augmentation with Discriminative Span Alignment. Transactions of the Association for Computational Linguistics, 9:494–509.
Cite (Informal):: Iterative Paraphrastic Augmentation with Discriminative Span Alignment (Culkin et al., TACL 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/emnlp-22-attachments/2021.tacl-1.30.pdf
Video:: https://preview.aclanthology.org/emnlp-22-attachments/2021.tacl-1.30.mp4

PDF Search Video