Distillation of encoder-decoder transformers for sequence labelling

Marco Farina, Duccio Pappadopulo, Anant Gupta, Leslie Huang, Ozan Irsoy, Thamar Solorio


Abstract
Driven by encouraging results on a wide range of tasks, the field of NLP is experiencing an accelerated race to develop bigger language models. This race for bigger models has also underscored the need to continue the pursuit of practical distillation approaches that can leverage the knowledge acquired by these big models in a compute-efficient manner. Having this goal in mind, we build on recent work to propose a hallucination-free framework for sequence tagging that is especially suited for distillation. We show empirical results of new state-of-the-art performance across multiple sequence labelling datasets and validate the usefulness of this framework for distilling a large model in a few-shot learning scenario.
Anthology ID:
2023.findings-eacl.192
Volume:
Findings of the Association for Computational Linguistics: EACL 2023
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2539–2549
Language:
URL:
https://aclanthology.org/2023.findings-eacl.192
DOI:
10.18653/v1/2023.findings-eacl.192
Bibkey:
Cite (ACL):
Marco Farina, Duccio Pappadopulo, Anant Gupta, Leslie Huang, Ozan Irsoy, and Thamar Solorio. 2023. Distillation of encoder-decoder transformers for sequence labelling. In Findings of the Association for Computational Linguistics: EACL 2023, pages 2539–2549, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Distillation of encoder-decoder transformers for sequence labelling (Farina et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2023.findings-eacl.192.pdf
Video:
 https://preview.aclanthology.org/naacl-24-ws-corrections/2023.findings-eacl.192.mp4