Cloze Distillation: Improving Neural Language Models with Human Next-Word Prediction

Tiwalayo Eisape, Noga Zaslavsky, Roger Levy


Abstract
Contemporary autoregressive language models (LMs) trained purely on corpus data have been shown to capture numerous features of human incremental processing. However, past work has also suggested dissociations between corpus probabilities and human next-word predictions. Here we evaluate several state-of-the-art language models for their match to human next-word predictions and to reading time behavior from eye movements. We then propose a novel method for distilling the linguistic information implicit in human linguistic predictions into pre-trained LMs: Cloze Distillation. We apply this method to a baseline neural LM and show potential improvement in reading time prediction and generalization to held-out human cloze data.
Anthology ID:
2020.conll-1.49
Volume:
Proceedings of the 24th Conference on Computational Natural Language Learning
Month:
November
Year:
2020
Address:
Online
Venue:
CoNLL
SIG:
SIGNLL
Publisher:
Association for Computational Linguistics
Note:
Pages:
609–619
Language:
URL:
https://aclanthology.org/2020.conll-1.49
DOI:
10.18653/v1/2020.conll-1.49
Bibkey:
Cite (ACL):
Tiwalayo Eisape, Noga Zaslavsky, and Roger Levy. 2020. Cloze Distillation: Improving Neural Language Models with Human Next-Word Prediction. In Proceedings of the 24th Conference on Computational Natural Language Learning, pages 609–619, Online. Association for Computational Linguistics.
Cite (Informal):
Cloze Distillation: Improving Neural Language Models with Human Next-Word Prediction (Eisape et al., CoNLL 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.conll-1.49.pdf
Data
WikiText-103WikiText-2