CUE Vectors: Modular Training of Language Models Conditioned on Diverse Contextual Signals

Scott Novotney, Sreeparna Mukherjee, Zeeshan Ahmed, Andreas Stolcke


Abstract
We propose a framework to modularize the training of neural language models that use diverse forms of context by eliminating the need to jointly train context and within-sentence encoders. Our approach, contextual universal embeddings (CUE), trains LMs on one type of contextual data and adapts to novel context types. The model consists of a pretrained neural sentence LM, a BERT-based contextual encoder, and a masked transfomer decoder that estimates LM probabilities using sentence-internal and contextual evidence.When contextually annotated data is unavailable, our model learns to combine contextual and sentence-internal information using noisy oracle unigram embeddings as a proxy. Real context data can be introduced later and used to adapt a small number of parameters that map contextual data into the decoder’s embedding space. We validate the CUE framework on a NYTimes text corpus with multiple metadata types, for which the LM perplexity can be lowered from 36.6 to 27.4 by conditioning on context. Bootstrapping a contextual LM with only a subset of the metadata during training retains 85% of the achievable gain. Training the model initially with proxy context retains 67% of the perplexity gain after adapting to real context. Furthermore, we can swap one type of pretrained sentence LM for another without retraining the context encoders, by only adapting the decoder model. Overall, we obtain a modular framework that allows incremental, scalable training of context-enhanced LMs.
Anthology ID:
2022.findings-acl.265
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3368–3379
Language:
URL:
https://aclanthology.org/2022.findings-acl.265
DOI:
10.18653/v1/2022.findings-acl.265
Bibkey:
Cite (ACL):
Scott Novotney, Sreeparna Mukherjee, Zeeshan Ahmed, and Andreas Stolcke. 2022. CUE Vectors: Modular Training of Language Models Conditioned on Diverse Contextual Signals. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3368–3379, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
CUE Vectors: Modular Training of Language Models Conditioned on Diverse Contextual Signals (Novotney et al., Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2022.findings-acl.265.pdf
Video:
 https://preview.aclanthology.org/auto-file-uploads/2022.findings-acl.265.mp4