BioReader: a Retrieval-Enhanced Text-to-Text Transformer for Biomedical Literature
Giacomo Frisoni, Miki Mizutani, Gianluca Moro, Lorenzo Valgimigli
Abstract
The latest batch of research has equipped language models with the ability to attend over relevant and factual information from non-parametric external sources, drawing a complementary path to architectural scaling. Besides mastering language, exploiting and contextualizing the latent world knowledge is crucial in complex domains like biomedicine. However, most works in the field rely on general-purpose models supported by databases like Wikipedia and Books. We introduce BioReader, the first retrieval-enhanced text-to-text model for biomedical natural language processing. Our domain-specific T5-based solution augments the input prompt by fetching and assembling relevant scientific literature chunks from a neural database with ≈60 million tokens centered on PubMed. We fine-tune and evaluate BioReader on a broad array of downstream tasks, significantly outperforming several state-of-the-art methods despite using up to 3x fewer parameters. In tandem with extensive ablation studies, we show that domain knowledge can be easily altered or supplemented to make the model generate correct predictions bypassing the retraining step and thus addressing the literature overload issue.- Anthology ID:
- 2022.emnlp-main.390
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5770–5793
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-main.390
- DOI:
- 10.18653/v1/2022.emnlp-main.390
- Cite (ACL):
- Giacomo Frisoni, Miki Mizutani, Gianluca Moro, and Lorenzo Valgimigli. 2022. BioReader: a Retrieval-Enhanced Text-to-Text Transformer for Biomedical Literature. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5770–5793, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- BioReader: a Retrieval-Enhanced Text-to-Text Transformer for Biomedical Literature (Frisoni et al., EMNLP 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2022.emnlp-main.390.pdf