Abstract
Pre-trained language models that learn contextualized word representations from a large un-annotated corpus have become a standard component for many state-of-the-art NLP systems. Despite their successful applications in various downstream NLP tasks, the extent of contextual impact on the word representation has not been explored. In this paper, we present a detailed analysis of contextual impact in Transformer- and BiLSTM-based masked language models. We follow two different approaches to evaluate the impact of context: a masking based approach that is architecture agnostic, and a gradient based approach that requires back-propagation through networks. The findings suggest significant differences on the contextual impact between the two model architectures. Through further breakdown of analysis by syntactic categories, we find the contextual impact in Transformer-based MLM aligns well with linguistic intuition. We further explore the Transformer attention pruning based on our findings in contextual analysis.- Anthology ID:
- 2020.findings-emnlp.338
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2020
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3789–3804
- Language:
- URL:
- https://aclanthology.org/2020.findings-emnlp.338
- DOI:
- 10.18653/v1/2020.findings-emnlp.338
- Cite (ACL):
- Yi-An Lai, Garima Lalwani, and Yi Zhang. 2020. Context Analysis for Pre-trained Masked Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3789–3804, Online. Association for Computational Linguistics.
- Cite (Informal):
- Context Analysis for Pre-trained Masked Language Models (Lai et al., Findings 2020)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2020.findings-emnlp.338.pdf
- Data
- CoNLL-2003, GLUE