XF2T: Cross-lingual Fact-to-Text Generation for Low-Resource Languages
Shivprasad Sagare, Tushar Abhishek, Bhavyajeet Singh, Anubhav Sharma, Manish Gupta, Vasudeva Varma
Abstract
Multiple business scenarios require an automated generation of descriptive human-readable text from structured input data. This has resulted into substantial work on fact-to-text generation systems recently. Unfortunately, previous work on fact-to-text (F2T) generation has focused primarily on English mainly due to the high availability of relevant datasets. Only recently, the problem of cross-lingual fact-to-text (XF2T) was proposed for generation across multiple languages alongwith a dataset, XAlign for eight languages. However, there has been no rigorous work on the actual XF2T generation problem. We extend XAlign dataset with annotated data for four more languages: Punjabi, Malayalam, Assamese and Oriya. We conduct an extensive study using popular Transformer-based text generation models on our extended multi-lingual dataset, which we call XAlignV2. Further, we investigate the performance of different text generation strategies: multiple variations of pretraining, fact-aware embeddings and structure-aware input encoding. Our extensive experiments show that a multi-lingual mT5 model which uses fact-aware embeddings with structure-aware input encoding leads to best results (30.90 BLEU, 55.12 METEOR and 59.17 chrF++) across the twelve languages. We make our code, dataset and model publicly available, and hope that this will help advance further research in this critical area.- Anthology ID:
- 2023.inlg-main.2
- Volume:
- Proceedings of the 16th International Natural Language Generation Conference
- Month:
- September
- Year:
- 2023
- Address:
- Prague, Czechia
- Editors:
- C. Maria Keet, Hung-Yi Lee, Sina Zarrieß
- Venues:
- INLG | SIGDIAL
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 15–27
- Language:
- URL:
- https://aclanthology.org/2023.inlg-main.2
- DOI:
- 10.18653/v1/2023.inlg-main.2
- Cite (ACL):
- Shivprasad Sagare, Tushar Abhishek, Bhavyajeet Singh, Anubhav Sharma, Manish Gupta, and Vasudeva Varma. 2023. XF2T: Cross-lingual Fact-to-Text Generation for Low-Resource Languages. In Proceedings of the 16th International Natural Language Generation Conference, pages 15–27, Prague, Czechia. Association for Computational Linguistics.
- Cite (Informal):
- XF2T: Cross-lingual Fact-to-Text Generation for Low-Resource Languages (Sagare et al., INLG-SIGDIAL 2023)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/2023.inlg-main.2.pdf