Controllable Meaning Representation to Text Generation: Linearization and Data Augmentation Strategies

Chris Kedzie, Kathleen McKeown


Abstract
We study the degree to which neural sequence-to-sequence models exhibit fine-grained controllability when performing natural language generation from a meaning representation. Using two task-oriented dialogue generation benchmarks, we systematically compare the effect of four input linearization strategies on controllability and faithfulness. Additionally, we evaluate how a phrase-based data augmentation method can improve performance. We find that properly aligning input sequences during training leads to highly controllable generation, both when training from scratch or when fine-tuning a larger pre-trained model. Data augmentation further improves control on difficult, randomly generated utterance plans.
Anthology ID:
2020.emnlp-main.419
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5160–5185
Language:
URL:
https://aclanthology.org/2020.emnlp-main.419
DOI:
10.18653/v1/2020.emnlp-main.419
Bibkey:
Cite (ACL):
Chris Kedzie and Kathleen McKeown. 2020. Controllable Meaning Representation to Text Generation: Linearization and Data Augmentation Strategies. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5160–5185, Online. Association for Computational Linguistics.
Cite (Informal):
Controllable Meaning Representation to Text Generation: Linearization and Data Augmentation Strategies (Kedzie & McKeown, EMNLP 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2020.emnlp-main.419.pdf
Video:
 https://slideslive.com/38939225