Abstract
Data-to-text (D2T) generation in the biomedical domain is a promising - yet mostly unexplored - field of research. Here, we apply neural models for D2T generation to a real-world dataset consisting of package leaflets of European medicines. We show that fine-tuned transformers are able to generate realistic, multi-sentence text from data in the biomedical domain, yet have important limitations. We also release a new dataset (BioLeaflets) for benchmarking D2T generation models in the biomedical domain.- Anthology ID:
- 2021.inlg-1.40
- Volume:
- Proceedings of the 14th International Conference on Natural Language Generation
- Month:
- August
- Year:
- 2021
- Address:
- Aberdeen, Scotland, UK
- Editors:
- Anya Belz, Angela Fan, Ehud Reiter, Yaji Sripada
- Venue:
- INLG
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 364–370
- Language:
- URL:
- https://aclanthology.org/2021.inlg-1.40
- DOI:
- 10.18653/v1/2021.inlg-1.40
- Cite (ACL):
- Ruslan Yermakov, Nicholas Drago, and Angelo Ziletti. 2021. Biomedical Data-to-Text Generation via Fine-Tuning Transformers. In Proceedings of the 14th International Conference on Natural Language Generation, pages 364–370, Aberdeen, Scotland, UK. Association for Computational Linguistics.
- Cite (Informal):
- Biomedical Data-to-Text Generation via Fine-Tuning Transformers (Yermakov et al., INLG 2021)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/2021.inlg-1.40.pdf
- Code
- bayer-science-for-a-better-life/data2text-bioleaflets
- Data
- BioLeaflets