Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks
Zhenhailong Wang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen, Heng Ji
Abstract
Although large language models have exhibited impressive zero-shot ability, the huge model size generally incurs high cost. Recently, semi-parametric language models, which augment a smaller language model with retrieved related background knowledge, alleviate the need for storing everything into the model parameters. Although existing semi-parametric language models have demonstrated promising language modeling capabilities, it remains unclear whether they can exhibit competitive zero-shot abilities as their fully-parametric counterparts. In this work, we introduce Zemi, a semi-parametric language model for zero-shot task generalization. To our best knowledge, this is the first semi-parametric language model that can demonstrate strong zero-shot performance on a wide range of held-out unseen tasks. We train Zemi with semi-parametric multitask training, which shows significant improvement compared with the parametric multitask training as proposed by T0. Specifically, during both training and inference, Zemi is equipped with a retrieval system based on the unlabeled pretraining corpus of our backbone model. To address the unique challenges from large-scale retrieval, we further propose a novel retrieval-augmentation fusion module that can effectively incorporate noisy retrieved documents. Finally, we show detailed analysis and ablation studies on the key ingredients towards building effective zero-shot semi-parametric language models. Notably, our proposed Zemi_Large model outperforms T0-3B by 16% across seven diverse evaluation tasks while being 3.8x smaller in scale.- Anthology ID:
- 2023.findings-acl.246
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2023
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3978–4004
- Language:
- URL:
- https://aclanthology.org/2023.findings-acl.246
- DOI:
- 10.18653/v1/2023.findings-acl.246
- Cite (ACL):
- Zhenhailong Wang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen, and Heng Ji. 2023. Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks. In Findings of the Association for Computational Linguistics: ACL 2023, pages 3978–4004, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks (Wang et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/2023.findings-acl.246.pdf