Abstract
In this work, we describe our efforts in improving the variety of language generated from a rule-based NLG system for automated journalism. We present two approaches: one based on inserting completely new words into sentences generated from templates, and another based on replacing words with synonyms. Our initial results from a human evaluation conducted in English indicate that these approaches successfully improve the variety of the language without significantly modifying sentence meaning. We also present variations of the methods applicable to low-resource languages, simulated here using Finnish, where cross-lingual aligned embeddings are harnessed to make use of linguistic resources in a high-resource language. A human evaluation indicates that while proposed methods show potential in the low-resource case, additional work is needed to improve their performance.- Anthology ID:
- 2021.hackashop-1.9
- Volume:
- Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation
- Month:
- April
- Year:
- 2021
- Address:
- Online
- Editors:
- Hannu Toivonen, Michele Boggia
- Venue:
- Hackashop
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 62–70
- Language:
- URL:
- https://aclanthology.org/2021.hackashop-1.9
- DOI:
- Cite (ACL):
- Miia Rämö and Leo Leppänen. 2021. Using contextual and cross-lingual word embeddings to improve variety in template-based NLG for automated journalism. In Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation, pages 62–70, Online. Association for Computational Linguistics.
- Cite (Informal):
- Using contextual and cross-lingual word embeddings to improve variety in template-based NLG for automated journalism (Rämö & Leppänen, Hackashop 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2021.hackashop-1.9.pdf