Abstract
Pre-trained language models (LMs), such as BERT (Devlin et al., 2018) and its variants, have led to significant improvements on various NLP tasks in past years. However, a theoretical framework for studying their relationships is still missing. In this paper, we fill this gap by investigating the linear dependency between pre-trained LMs. The linear dependency of LMs is defined analogously to the linear dependency of vectors. We propose Language Model Decomposition (LMD) to represent a LM using a linear combination of other LMs as basis, and derive the closed-form solution. A goodness-of-fit metric for LMD similar to the coefficient of determination is defined and used to measure the linear dependency of a set of LMs. In experiments, we find that BERT and eleven (11) BERT-like LMs are 91% linearly dependent. This observation suggests that current state-of-the-art (SOTA) LMs are highly “correlated”. To further advance SOTA we need more diverse and novel LMs that are less dependent on existing LMs.- Anthology ID:
- 2022.emnlp-main.161
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2508–2517
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-main.161
- DOI:
- 10.18653/v1/2022.emnlp-main.161
- Cite (ACL):
- Hao Zhang. 2022. Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2508–2517, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models (Zhang, EMNLP 2022)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2022.emnlp-main.161.pdf