DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Qingqing Cao; Harsh Trivedi; Aruna Balasubramanian; Niranjan Balasubramanian

doi:10.18653/v1/2020.acl-main.411

DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Qingqing Cao, Harsh Trivedi, Aruna Balasubramanian, Niranjan Balasubramanian

Abstract

Transformer-based QA models use input-wide self-attention – i.e. across both the question and the input passage – at all layers, causing them to be slow and memory-intensive. It turns out that we can get by without input-wide self-attention at all layers, especially in the lower layers. We introduce DeFormer, a decomposed transformer, which substitutes the full self-attention with question-wide and passage-wide self-attentions in the lower layers. This allows for question-independent processing of the input text representations, which in turn enables pre-computing passage representations reducing runtime compute drastically. Furthermore, because DeFormer is largely similar to the original model, we can initialize DeFormer with the pre-training weights of a standard transformer, and directly fine-tune on the target QA dataset. We show DeFormer versions of BERT and XLNet can be used to speed up QA by over 4.3x and with simple distillation-based losses they incur only a 1% drop in accuracy. We open source the code at https://github.com/StonyBrookNLP/deformer.

Anthology ID:: 2020.acl-main.411
Volume:: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:: July
Year:: 2020
Address:: Online
Editors:: Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4487–4497
Language:
URL:: https://preview.aclanthology.org/nschneid-patch-2/2020.acl-main.411/
DOI:: 10.18653/v1/2020.acl-main.411
Bibkey:
Cite (ACL):: Qingqing Cao, Harsh Trivedi, Aruna Balasubramanian, and Niranjan Balasubramanian. 2020. DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4487–4497, Online. Association for Computational Linguistics.
Cite (Informal):: DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering (Cao et al., ACL 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-2/2020.acl-main.411.pdf
Video:: http://slideslive.com/38929429

PDF Cite Search Video Fix data