Wordnet for Definition Augmentation with Encoder-Decoder Architecture

Konrad Wojtasik, Arkadiusz Janz, Maciej Piasecki


Abstract
Data augmentation is a difficult task in Natural Language Processing. Simple methods that can be relatively easily applied in other domains like insertion, deletion or substitution, mostly result in changing the sentence meaning significantly and obtaining an incorrect example. Wordnets are potentially a perfect source of rich and high quality data that when integrated with the powerful capacity of generative models can help to solve this complex task. In this work, we use plWordNet, which is a wordnet of the Polish language, to explore the capability of encoder-decoder architectures in data augmentation of sense glosses. We discuss the limitations of generative methods and perform qualitative review of generated data samples.
Anthology ID:
2023.gwc-1.6
Volume:
Proceedings of the 12th Global Wordnet Conference
Month:
January
Year:
2023
Address:
University of the Basque Country, Donostia - San Sebastian, Basque Country
Editors:
German Rigau, Francis Bond, Alexandre Rademaker
Venue:
GWC
SIG:
Publisher:
Global Wordnet Association
Note:
Pages:
50–59
Language:
URL:
https://aclanthology.org/2023.gwc-1.6
DOI:
Bibkey:
Cite (ACL):
Konrad Wojtasik, Arkadiusz Janz, and Maciej Piasecki. 2023. Wordnet for Definition Augmentation with Encoder-Decoder Architecture. In Proceedings of the 12th Global Wordnet Conference, pages 50–59, University of the Basque Country, Donostia - San Sebastian, Basque Country. Global Wordnet Association.
Cite (Informal):
Wordnet for Definition Augmentation with Encoder-Decoder Architecture (Wojtasik et al., GWC 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2023.gwc-1.6.pdf