David Bergés


2020

establishes key guidelines on how, which and when Machine Translation (MT) techniques are worth applying to RDF-to-Text task. Not only do we apply and compare the most prominent MT architecture, the Transformer, but we also analyze state-of-the-art techniques such as Byte Pair Encoding or Back Translation to demonstrate an improvement in generalization. In addition, we empirically show how to tailor these techniques to enhance models relying on learned embeddings rather than using pretrained ones. Automatic metrics suggest that Back Translation can significantly improve model performance up to 7 BLEU points, hence, opening a window for surpassing state-of-the-art results with appropriate architectures.
This work describes the end-to-end system architecture presented at WebNLG Challenge 2020. The system follows the traditional Machine Translation (MT) pipeline, based on the Transformer model, applied in most text-totext problems. Our solution is enriched by means of a Back Translation step over the original corpus. Thus, the system directly relies on lexicalise format since the synthetic data limits the use of delexicalisation.