Evaluating Approaches to Personalizing Language Models

Milton King, Paul Cook


Abstract
In this work, we consider the problem of personalizing language models, that is, building language models that are tailored to the writing style of an individual. Because training language models requires a large amount of text, and individuals do not necessarily possess a large corpus of their writing that could be used for training, approaches to personalizing language models must be able to rely on only a small amount of text from any one user. In this work, we compare three approaches to personalizing a language model that was trained on a large background corpus using a relatively small amount of text from an individual user. We evaluate these approaches using perplexity, as well as two measures based on next word prediction for smartphone soft keyboards. Our results show that when only a small amount of user-specific text is available, an approach based on priming gives the most improvement, while when larger amounts of user-specific text are available, an approach based on language model interpolation performs best. We carry out further experiments to show that these approaches to personalization outperform language model adaptation based on demographic factors.
Anthology ID:
2020.lrec-1.299
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2461–2469
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.299
DOI:
Bibkey:
Cite (ACL):
Milton King and Paul Cook. 2020. Evaluating Approaches to Personalizing Language Models. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2461–2469, Marseille, France. European Language Resources Association.
Cite (Informal):
Evaluating Approaches to Personalizing Language Models (King & Cook, LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.lrec-1.299.pdf