Abstract
This work presents state of the art results in reconstruction of surface realizations from obfuscated text. We identify the lack of sufficient training data as the major obstacle to training high-performing models, and solve this issue by generating large amounts of synthetic training data. We also propose preprocessing techniques which make the structure contained in the input features more accessible to sequence models. Our models were ranked first on all evaluation metrics in the English portion of the 2018 Surface Realization shared task.- Anthology ID:
- W18-3606
- Volume:
- Proceedings of the First Workshop on Multilingual Surface Realisation
- Month:
- July
- Year:
- 2018
- Address:
- Melbourne, Australia
- Editors:
- Simon Mille, Anja Belz, Bernd Bohnet, Emily Pitler, Leo Wanner
- Venue:
- ACL
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 49–53
- Language:
- URL:
- https://aclanthology.org/W18-3606
- DOI:
- 10.18653/v1/W18-3606
- Cite (ACL):
- Henry Elder and Chris Hokamp. 2018. Generating High-Quality Surface Realizations Using Data Augmentation and Factored Sequence Models. In Proceedings of the First Workshop on Multilingual Surface Realisation, pages 49–53, Melbourne, Australia. Association for Computational Linguistics.
- Cite (Informal):
- Generating High-Quality Surface Realizations Using Data Augmentation and Factored Sequence Models (Elder & Hokamp, ACL 2018)
- PDF:
- https://preview.aclanthology.org/corrections-2024-05/W18-3606.pdf