Combining Parameter-efficient Modules for Task-level Generalisation

Edoardo Maria Ponti, Alessandro Sordoni, Yoshua Bengio, Siva Reddy


Abstract
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks. In this work, we assume that each task is associated with a subset of latent skills from an (arbitrary size) inventory. In turn, each skill corresponds to a parameter-efficient (sparse / low-rank) model adapter. By jointly learning adapters and a routing function that allocates skills to each task, the full network is instantiated as the average of the parameters of active skills. We propose several inductive biases that encourage re-usage and composition of the skills, including variable-size skill allocation and a dual-speed learning rate. We evaluate our latent-skill model in two main settings: 1) multitask reinforcement learning for instruction following on 8 levels of the BabyAI platform; and 2) few-shot fine-tuning of language models on 160 NLP tasks of the CrossFit benchmark. We find that the modular design of our network enhances sample efficiency in reinforcement learning and few-shot generalisation in supervised learning, compared to a series of baselines. These include models where parameters are fully shared, task-specific, conditionally generated (HyperFormer), or sparse mixture-of-experts (TaskMoE).
Anthology ID:
2023.eacl-main.49
Volume:
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
687–702
Language:
URL:
https://aclanthology.org/2023.eacl-main.49
DOI:
Bibkey:
Cite (ACL):
Edoardo Maria Ponti, Alessandro Sordoni, Yoshua Bengio, and Siva Reddy. 2023. Combining Parameter-efficient Modules for Task-level Generalisation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 687–702, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Combining Parameter-efficient Modules for Task-level Generalisation (Ponti et al., EACL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/remove-xml-comments/2023.eacl-main.49.pdf
Video:
 https://preview.aclanthology.org/remove-xml-comments/2023.eacl-main.49.mp4