Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models

Hao Zhang


Abstract
Pre-trained language models (LMs), such as BERT (Devlin et al., 2018) and its variants, have led to significant improvements on various NLP tasks in past years. However, a theoretical framework for studying their relationships is still missing. In this paper, we fill this gap by investigating the linear dependency between pre-trained LMs. The linear dependency of LMs is defined analogously to the linear dependency of vectors. We propose Language Model Decomposition (LMD) to represent a LM using a linear combination of other LMs as basis, and derive the closed-form solution. A goodness-of-fit metric for LMD similar to the coefficient of determination is defined and used to measure the linear dependency of a set of LMs. In experiments, we find that BERT and eleven (11) BERT-like LMs are 91% linearly dependent. This observation suggests that current state-of-the-art (SOTA) LMs are highly “correlated”. To further advance SOTA we need more diverse and novel LMs that are less dependent on existing LMs.
Anthology ID:
2022.emnlp-main.161
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2508–2517
Language:
URL:
https://aclanthology.org/2022.emnlp-main.161
DOI:
Bibkey:
Cite (ACL):
Hao Zhang. 2022. Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2508–2517, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models (Zhang, EMNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.emnlp-main.161.pdf