Using contextual and cross-lingual word embeddings to improve variety in template-based NLG for automated journalism

Miia Rämö, Leo Leppänen


Abstract
In this work, we describe our efforts in improving the variety of language generated from a rule-based NLG system for automated journalism. We present two approaches: one based on inserting completely new words into sentences generated from templates, and another based on replacing words with synonyms. Our initial results from a human evaluation conducted in English indicate that these approaches successfully improve the variety of the language without significantly modifying sentence meaning. We also present variations of the methods applicable to low-resource languages, simulated here using Finnish, where cross-lingual aligned embeddings are harnessed to make use of linguistic resources in a high-resource language. A human evaluation indicates that while proposed methods show potential in the low-resource case, additional work is needed to improve their performance.
Anthology ID:
2021.hackashop-1.9
Volume:
Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation
Month:
April
Year:
2021
Address:
Online
Editors:
Hannu Toivonen, Michele Boggia
Venue:
Hackashop
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
62–70
Language:
URL:
https://aclanthology.org/2021.hackashop-1.9
DOI:
Bibkey:
Cite (ACL):
Miia Rämö and Leo Leppänen. 2021. Using contextual and cross-lingual word embeddings to improve variety in template-based NLG for automated journalism. In Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation, pages 62–70, Online. Association for Computational Linguistics.
Cite (Informal):
Using contextual and cross-lingual word embeddings to improve variety in template-based NLG for automated journalism (Rämö & Leppänen, Hackashop 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2021.hackashop-1.9.pdf