Abstract
This paper describes the submission by NUIG-DSI to the GEM benchmark 2021. We participate in the modeling shared task where we submit outputs on four datasets for data-to-text generation, namely, DART, WebNLG (en), E2E and CommonGen. We follow an approach similar to the one described in the GEM benchmark paper where we use the pre-trained T5-base model for our submission. We train this model on additional monolingual data where we experiment with different masking strategies specifically focused on masking entities, predicates and concepts as well as a random masking strategy for pre-training. In our results we find that random masking performs the best in terms of automatic evaluation metrics, though the results are not statistically significantly different compared to other masking strategies.- Anthology ID:
- 2021.gem-1.13
- Volume:
- Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Venue:
- GEM
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 148–154
- Language:
- URL:
- https://aclanthology.org/2021.gem-1.13
- DOI:
- 10.18653/v1/2021.gem-1.13
- Cite (ACL):
- Nivranshu Pasricha, Mihael Arcan, and Paul Buitelaar. 2021. NUIG-DSI’s submission to The GEM Benchmark 2021. In Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021), pages 148–154, Online. Association for Computational Linguistics.
- Cite (Informal):
- NUIG-DSI’s submission to The GEM Benchmark 2021 (Pasricha et al., GEM 2021)
- PDF:
- https://preview.aclanthology.org/starsem-semeval-split/2021.gem-1.13.pdf
- Data
- CommonGen, DBpedia, GEM