Do Long-Range Language Models Actually Use Long-Range Context?

Simeng Sun, Kalpesh Krishna, Andrew Mattarella-Micke, Mohit Iyyer


Abstract
Language models are generally trained on short, truncated input sequences, which limits their ability to use discourse-level information present in long-range context to improve their predictions. Recent efforts to improve the efficiency of self-attention have led to a proliferation of long-range Transformer language models, which can process much longer sequences than models of the past. However, the ways in which such models take advantage of the long-range context remain unclear. In this paper, we perform a fine-grained analysis of two long-range Transformer language models (including the Routing Transformer, which achieves state-of-the-art perplexity on the PG-19 long-sequence LM benchmark dataset) that accept input sequences of up to 8K tokens. Our results reveal that providing long-range context (i.e., beyond the previous 2K tokens) to these models only improves their predictions on a small set of tokens (e.g., those that can be copied from the distant context) and does not help at all for sentence-level prediction tasks. Finally, we discover that PG-19 contains a variety of different document types and domains, and that long-range context helps most for literary novels (as opposed to textbooks or magazines).
Anthology ID:
2021.emnlp-main.62
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
807–822
Language:
URL:
https://aclanthology.org/2021.emnlp-main.62
DOI:
10.18653/v1/2021.emnlp-main.62
Bibkey:
Cite (ACL):
Simeng Sun, Kalpesh Krishna, Andrew Mattarella-Micke, and Mohit Iyyer. 2021. Do Long-Range Language Models Actually Use Long-Range Context?. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 807–822, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Do Long-Range Language Models Actually Use Long-Range Context? (Sun et al., EMNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2021.emnlp-main.62.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-2/2021.emnlp-main.62.mp4
Data
PG-19