Surface Realization Using Pretrained Language Models

Farhood Farahnak, Laya Rafiee, Leila Kosseim, Thomas Fevens


Abstract
In the context of Natural Language Generation, surface realization is the task of generating the linear form of a text following a given grammar. Surface realization models usually consist of a cascade of complex sub-modules, either rule-based or neural network-based, each responsible for a specific sub-task. In this work, we show that a single encoder-decoder language model can be used in an end-to-end fashion for all sub-tasks of surface realization. The model is designed based on the BART language model that receives a linear representation of unordered and non-inflected tokens in a sentence along with their corresponding Universal Dependency information and produces the linear sequence of inflected tokens along with the missing words. The model was evaluated on the shallow and deep tracks of the 2020 Surface Realization Shared Task (SR’20) using both human and automatic evaluation. The results indicate that despite its simplicity, our model achieves competitive results among all participants in the shared task.
Anthology ID:
2020.msr-1.7
Volume:
Proceedings of the Third Workshop on Multilingual Surface Realisation
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venue:
MSR
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
57–63
Language:
URL:
https://aclanthology.org/2020.msr-1.7
DOI:
Bibkey:
Cite (ACL):
Farhood Farahnak, Laya Rafiee, Leila Kosseim, and Thomas Fevens. 2020. Surface Realization Using Pretrained Language Models. In Proceedings of the Third Workshop on Multilingual Surface Realisation, pages 57–63, Barcelona, Spain (Online). Association for Computational Linguistics.
Cite (Informal):
Surface Realization Using Pretrained Language Models (Farahnak et al., MSR 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.msr-1.7.pdf