2021
pdf
abs
Now, It’s Personal : The Need for Personalized Word Sense Disambiguation
Milton King
|
Paul Cook
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Authors of text tend to predominantly use a single sense for a lemma that can differ among different authors. This might not be captured with an author-agnostic word sense disambiguation (WSD) model that was trained on multiple authors. Our work finds that WordNet’s first senses, the predominant senses of our dataset’s genre, and the predominant senses of an author can all be different and therefore, author-agnostic models could perform well over the entire dataset, but poorly on individual authors. In this work, we explore methods for personalizing WSD models by tailoring existing state-of-the-art models toward an individual by exploiting the author’s sense distributions. We propose a novel WSD dataset and show that personalizing a WSD system with knowledge of an author’s sense distributions or predominant senses can greatly increase its performance.
pdf
abs
UNBNLP at SemEval-2021 Task 1: Predicting lexical complexity with masked language models and character-level encoders
Milton King
|
Ali Hakimi Parizi
|
Samin Fakharian
|
Paul Cook
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
In this paper, we present three supervised systems for English lexical complexity prediction of single and multiword expressions for SemEval-2021 Task 1. We explore the use of statistical baseline features, masked language models, and character-level encoders to predict the complexity of a target token in context. Our best system combines information from these three sources. The results indicate that information from masked language models and character-level encoders can be combined to improve lexical complexity prediction.
2020
pdf
abs
Evaluating Approaches to Personalizing Language Models
Milton King
|
Paul Cook
Proceedings of the Twelfth Language Resources and Evaluation Conference
In this work, we consider the problem of personalizing language models, that is, building language models that are tailored to the writing style of an individual. Because training language models requires a large amount of text, and individuals do not necessarily possess a large corpus of their writing that could be used for training, approaches to personalizing language models must be able to rely on only a small amount of text from any one user. In this work, we compare three approaches to personalizing a language model that was trained on a large background corpus using a relatively small amount of text from an individual user. We evaluate these approaches using perplexity, as well as two measures based on next word prediction for smartphone soft keyboards. Our results show that when only a small amount of user-specific text is available, an approach based on priming gives the most improvement, while when larger amounts of user-specific text are available, an approach based on language model interpolation performs best. We carry out further experiments to show that these approaches to personalization outperform language model adaptation based on demographic factors.
2019
pdf
abs
UNBNLP at SemEval-2019 Task 5 and 6: Using Language Models to Detect Hate Speech and Offensive Language
Ali Hakimi Parizi
|
Milton King
|
Paul Cook
Proceedings of the 13th International Workshop on Semantic Evaluation
In this paper we apply a range of approaches to language modeling – including word-level n-gram and neural language models, and character-level neural language models – to the problem of detecting hate speech and offensive language. Our findings indicate that language models are able to capture knowledge of whether text is hateful or offensive. However, our findings also indicate that more conventional approaches to text classification often perform similarly or better.
2018
pdf
abs
UNBNLP at SemEval-2018 Task 10: Evaluating unsupervised approaches to capturing discriminative attributes
Milton King
|
Ali Hakimi Parizi
|
Paul Cook
Proceedings of the 12th International Workshop on Semantic Evaluation
In this paper we present three unsupervised models for capturing discriminative attributes based on information from word embeddings, WordNet, and sentence-level word co-occurrence frequency. We show that, of these approaches, the simple approach based on word co-occurrence performs best. We further consider supervised and unsupervised approaches to combining information from these models, but these approaches do not improve on the word co-occurrence model.
pdf
abs
Leveraging distributed representations and lexico-syntactic fixedness for token-level prediction of the idiomaticity of English verb-noun combinations
Milton King
|
Paul Cook
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Verb-noun combinations (VNCs) - e.g., blow the whistle, hit the roof, and see stars - are a common type of English idiom that are ambiguous with literal usages. In this paper we propose and evaluate models for classifying VNC usages as idiomatic or literal, based on a variety of approaches to forming distributed representations. Our results show that a model based on averaging word embeddings performs on par with, or better than, a previously-proposed approach based on skip-thoughts. Idiomatic usages of VNCs are known to exhibit lexico-syntactic fixedness. We further incorporate this information into our models, demonstrating that this rich linguistic knowledge is complementary to the information carried by distributed representations.
2017
pdf
abs
Supervised and unsupervised approaches to measuring usage similarity
Milton King
|
Paul Cook
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications
Usage similarity (USim) is an approach to determining word meaning in context that does not rely on a sense inventory. Instead, pairs of usages of a target lemma are rated on a scale. In this paper we propose unsupervised approaches to USim based on embeddings for words, contexts, and sentences, and achieve state-of-the-art results over two USim datasets. We further consider supervised approaches to USim, and find that although they outperform unsupervised approaches, they are unable to generalize to lemmas that are unseen in the training data.
2016
pdf
UNBNLP at SemEval-2016 Task 1: Semantic Textual Similarity: A Unified Framework for Semantic Processing and Evaluation
Milton King
|
Waseem Gharbieh
|
SoHyun Park
|
Paul Cook
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)