Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers
Anne Lauscher, Olga Majewska, Leonardo F. R. Ribeiro, Iryna Gurevych, Nikolai Rozanov, Goran Glavaš
Abstract
Following the major success of neural language models (LMs) such as BERT or GPT-2 on a variety of language understanding tasks, recent work focused on injecting (structured) knowledge from external resources into these models. While on the one hand, joint pre-training (i.e., training from scratch, adding objectives based on external knowledge to the primary LM objective) may be prohibitively computationally expensive, post-hoc fine-tuning on external knowledge, on the other hand, may lead to the catastrophic forgetting of distributional knowledge. In this work, we investigate models for complementing the distributional knowledge of BERT with conceptual knowledge from ConceptNet and its corresponding Open Mind Common Sense (OMCS) corpus, respectively, using adapter training. While overall results on the GLUE benchmark paint an inconclusive picture, a deeper analysis reveals that our adapter-based models substantially outperform BERT (up to 15-20 performance points) on inference tasks that require the type of conceptual knowledge explicitly present in ConceptNet and OMCS. We also open source all our experiments and relevant code under: https://github.com/wluper/retrograph.- Anthology ID:
- 2020.deelio-1.5
- Volume:
- Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- DeeLIO
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 43–49
- Language:
- URL:
- https://aclanthology.org/2020.deelio-1.5
- DOI:
- 10.18653/v1/2020.deelio-1.5
- Cite (ACL):
- Anne Lauscher, Olga Majewska, Leonardo F. R. Ribeiro, Iryna Gurevych, Nikolai Rozanov, and Goran Glavaš. 2020. Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers. In Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pages 43–49, Online. Association for Computational Linguistics.
- Cite (Informal):
- Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers (Lauscher et al., DeeLIO 2020)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2020.deelio-1.5.pdf
- Code
- wluper/retrograph
- Data
- CoLA, ConceptNet, SST, SuperGLUE