Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks

Zhenhailong Wang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen, Heng Ji


Abstract
Although large language models have exhibited impressive zero-shot ability, the huge model size generally incurs high cost. Recently, semi-parametric language models, which augment a smaller language model with retrieved related background knowledge, alleviate the need for storing everything into the model parameters. Although existing semi-parametric language models have demonstrated promising language modeling capabilities, it remains unclear whether they can exhibit competitive zero-shot abilities as their fully-parametric counterparts. In this work, we introduce Zemi, a semi-parametric language model for zero-shot task generalization. To our best knowledge, this is the first semi-parametric language model that can demonstrate strong zero-shot performance on a wide range of held-out unseen tasks. We train Zemi with semi-parametric multitask training, which shows significant improvement compared with the parametric multitask training as proposed by T0. Specifically, during both training and inference, Zemi is equipped with a retrieval system based on the unlabeled pretraining corpus of our backbone model. To address the unique challenges from large-scale retrieval, we further propose a novel retrieval-augmentation fusion module that can effectively incorporate noisy retrieved documents. Finally, we show detailed analysis and ablation studies on the key ingredients towards building effective zero-shot semi-parametric language models. Notably, our proposed Zemi_Large model outperforms T0-3B by 16% across seven diverse evaluation tasks while being 3.8x smaller in scale.
Anthology ID:
2023.findings-acl.246
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3978–4004
Language:
URL:
https://aclanthology.org/2023.findings-acl.246
DOI:
10.18653/v1/2023.findings-acl.246
Bibkey:
Cite (ACL):
Zhenhailong Wang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen, and Heng Ji. 2023. Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks. In Findings of the Association for Computational Linguistics: ACL 2023, pages 3978–4004, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks (Wang et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/improve-issue-templates/2023.findings-acl.246.pdf