This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
MiltonKing
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
In this work, we discuss our models that were applied to the SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection (Muhammad et al., 2025b). We focused on the English data set of track A, which involves determining what emotions the reader of a snippet of text is feeling. We applied three different types of models that vary in their approaches and reported our findings on the task’s test set. We found that the performance of our models differed from each other, but neither of our models outperformed the task’s baseline model.
Phobias, characterized by irrational fears of specific objects or situations, can profoundly affect an individual’s quality of life. This research presents a comprehensive investigation into phobia classification, where we propose a novel dataset of 811,569 English tweets from user timelines spanning 102 phobia subtypes over six months, including 47,614 self-diagnosed phobia users. BERT models were leveraged to differentiate non-phobia from phobia users and classify them into 65 specific phobia subtypes. The study produced promising results, with the highest f1-score of 78.44% in binary classification (phobic user or not phobic user) and 24.01% in a multi-class classification (detecting the specific phobia subtype of a user). This research provides insights into people with phobias on social media and emphasizes the capacity of natural language processing and machine learning to automate the evaluation and support of mental health.
The predominant sense of a lemma can vary based on the timeframe (years, decades, centuries) that the text was written. In our work, we explore the predominant sense of shorter timeframes (days, months, seasons, etc.) and find that different short timeframes can have different predominant senses from each other and from the predominant sense of a corpus. Leveraging the predominant sense and sense distribution of a short timeframe, we design short timeframe temporal-aware word sense disambiguation (WSD) models that outperform a temporal agnostic model. Likewise, author-aware WSD models tend to outperform author agnostic models, therefore we augment our temporal-aware models to leverage knowledge of author-level predominant senses and sense distributions to create temporal and author-aware WSD models. In addition to this, we found that considering recent usages of a lemma by the same author can assist a WSD model. Our approach requires the use of only a small amount of text from authors and timeframes.
In this paper, we explore three unsupervised learning models that we applied to Task 9: BRAINTEASER of SemEval 2024. Two of these models incorporate word sense disambiguation and part-of-speech tagging, specifically leveraging SensEmBERT and the Stanford log-linear part-of-speech tagger. Our third model relies on a more traditional language modelling approach. The best performing model, a bag-of-words model leveraging word sense disambiguation and part-of-speech tagging, secured the 10th spot out of 11 places on both the sentence puzzle and word puzzle subtasks.
In this paper, we discuss our models applied to Task 4: Human Value Detection of SemEval 2023, which incorporated two different embedding techniques to interpret the data. Preliminary experiments were conducted to observe important word types. Subsequently, we explored an XGBoost model, an unsupervised learning model, and two Ensemble learning models were then explored. The best performing model, an ensemble model employing a soft voting technique, secured the 34th spot out of 39 teams, on a class imbalanced dataset. We explored the inclusion of different parts of the provided knowledge resource and found that considering only specific parts assisted our models.
SemEval-2023’s Task 1, Visual Word Sense Disambiguation, a task about text semantics and visual semantics, selecting an image from a list of candidates, that best exhibits a given target word in a small context. We tried several methods, including the image captioning method and CLIP methods, and submitted our predictions in the competition for this task. This paper describes the methods we used and their performance and provides an analysis and discussion of the performance.
Authors of text tend to predominantly use a single sense for a lemma that can differ among different authors. This might not be captured with an author-agnostic word sense disambiguation (WSD) model that was trained on multiple authors. Our work finds that WordNet’s first senses, the predominant senses of our dataset’s genre, and the predominant senses of an author can all be different and therefore, author-agnostic models could perform well over the entire dataset, but poorly on individual authors. In this work, we explore methods for personalizing WSD models by tailoring existing state-of-the-art models toward an individual by exploiting the author’s sense distributions. We propose a novel WSD dataset and show that personalizing a WSD system with knowledge of an author’s sense distributions or predominant senses can greatly increase its performance.
In this paper, we present three supervised systems for English lexical complexity prediction of single and multiword expressions for SemEval-2021 Task 1. We explore the use of statistical baseline features, masked language models, and character-level encoders to predict the complexity of a target token in context. Our best system combines information from these three sources. The results indicate that information from masked language models and character-level encoders can be combined to improve lexical complexity prediction.
In this work, we consider the problem of personalizing language models, that is, building language models that are tailored to the writing style of an individual. Because training language models requires a large amount of text, and individuals do not necessarily possess a large corpus of their writing that could be used for training, approaches to personalizing language models must be able to rely on only a small amount of text from any one user. In this work, we compare three approaches to personalizing a language model that was trained on a large background corpus using a relatively small amount of text from an individual user. We evaluate these approaches using perplexity, as well as two measures based on next word prediction for smartphone soft keyboards. Our results show that when only a small amount of user-specific text is available, an approach based on priming gives the most improvement, while when larger amounts of user-specific text are available, an approach based on language model interpolation performs best. We carry out further experiments to show that these approaches to personalization outperform language model adaptation based on demographic factors.
In this paper we apply a range of approaches to language modeling – including word-level n-gram and neural language models, and character-level neural language models – to the problem of detecting hate speech and offensive language. Our findings indicate that language models are able to capture knowledge of whether text is hateful or offensive. However, our findings also indicate that more conventional approaches to text classification often perform similarly or better.
Verb-noun combinations (VNCs) - e.g., blow the whistle, hit the roof, and see stars - are a common type of English idiom that are ambiguous with literal usages. In this paper we propose and evaluate models for classifying VNC usages as idiomatic or literal, based on a variety of approaches to forming distributed representations. Our results show that a model based on averaging word embeddings performs on par with, or better than, a previously-proposed approach based on skip-thoughts. Idiomatic usages of VNCs are known to exhibit lexico-syntactic fixedness. We further incorporate this information into our models, demonstrating that this rich linguistic knowledge is complementary to the information carried by distributed representations.
In this paper we present three unsupervised models for capturing discriminative attributes based on information from word embeddings, WordNet, and sentence-level word co-occurrence frequency. We show that, of these approaches, the simple approach based on word co-occurrence performs best. We further consider supervised and unsupervised approaches to combining information from these models, but these approaches do not improve on the word co-occurrence model.
Usage similarity (USim) is an approach to determining word meaning in context that does not rely on a sense inventory. Instead, pairs of usages of a target lemma are rated on a scale. In this paper we propose unsupervised approaches to USim based on embeddings for words, contexts, and sentences, and achieve state-of-the-art results over two USim datasets. We further consider supervised approaches to USim, and find that although they outperform unsupervised approaches, they are unable to generalize to lemmas that are unseen in the training data.