Alyssa Lees


SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification
Elisabetta Fersini | Francesca Gasparini | Giulia Rizzi | Aurora Saibene | Berta Chulvi | Paolo Rosso | Alyssa Lees | Jeffrey Sorensen
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

The paper describes the SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification (MAMI),which explores the detection of misogynous memes on the web by taking advantage of available texts and images. The task has been organised in two related sub-tasks: the first one is focused on recognising whether a meme is misogynous or not (Sub-task A), while the second one is devoted to recognising types of misogyny (Sub-task B). MAMI has been one of the most popular tasks at SemEval-2022 with more than 400 participants, 65 teams involved in Sub-task A and 41 in Sub-task B from 13 countries. The MAMI challenge received 4214 submitted runs (of which 166 uploaded on the leader-board), denoting an enthusiastic participation for the proposed problem.The collection and annotation is described for the task dataset.The paper provides an overview of the systems proposed for the challenge, reports the results achieved in both sub-tasks and outlines a description of the main errors for a comprehension of the systems capabilities and for detailing future research perspectives.

Lost in Distillation: A Case Study in Toxicity Modeling
Alyssa Chvasta | Alyssa Lees | Jeffrey Sorensen | Lucy Vasserman | Nitesh Goyal
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)

In an era of increasingly large pre-trained language models, knowledge distillation is a powerful tool for transferring information from a large model to a smaller one. In particular, distillation is of tremendous benefit when it comes to real-world constraints such as serving latency or serving at scale. However, a loss of robustness in language understanding may be hidden in the process and not immediately revealed when looking at high-level evaluation metrics. In this work, we investigate the hidden costs: what is “lost in distillation”, especially in regards to identity-based model bias using the case study of toxicity modeling. With reproducible models using open source training sets, we investigate models distilled from a BERT teacher baseline. Using both open source and proprietary big data models, we investigate these hidden performance costs.


ReasonBERT: Pre-trained to Reason with Distant Supervision
Xiang Deng | Yu Su | Alyssa Lees | You Wu | Cong Yu | Huan Sun
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

We present ReasonBert, a pre-training method that augments language models with the ability to reason over long-range relations and multiple, possibly hybrid contexts. Unlike existing pre-training methods that only harvest learning signals from local contexts of naturally occurring texts, we propose a generalized notion of distant supervision to automatically connect multiple pieces of text and tables to create pre-training examples that require long-range reasoning. Different types of reasoning are simulated, including intersecting multiple pieces of evidence, bridging from one piece of evidence to another, and detecting unanswerable cases. We conduct a comprehensive evaluation on a variety of extractive question answering datasets ranging from single-hop to multi-hop and from text-only to table-only to hybrid that require various reasoning capabilities and show that ReasonBert achieves remarkable improvement over an array of strong baselines. Few-shot experiments further demonstrate that our pre-training method substantially improves sample efficiency.

Capturing Covertly Toxic Speech via Crowdsourcing
Alyssa Lees | Daniel Borkan | Ian Kivlichan | Jorge Nario | Tesh Goyal
Proceedings of the First Workshop on Bridging Human–Computer Interaction and Natural Language Processing

We study the task of labeling covert or veiled toxicity in online conversations. Prior research has highlighted the difficulty in creating language models that recognize nuanced toxicity such as microaggressions. Our investigations further underscore the difficulty in parsing such labels reliably from raters via crowdsourcing. We introduce an initial dataset, COVERTTOXICITY, which aims to identify and categorize such comments from a refined rater template. Finally, we fine-tune a comment-domain BERT model to classify covertly offensive comments and compare against existing baselines.


Embedding Semantic Taxonomies
Alyssa Lees | Chris Welty | Shubin Zhao | Jacek Korycki | Sara Mc Carthy
Proceedings of the 28th International Conference on Computational Linguistics

A common step in developing an understanding of a vertical domain, e.g. shopping, dining, movies, medicine, etc., is curating a taxonomy of categories specific to the domain. These human created artifacts have been the subject of research in embeddings that attempt to encode aspects of the partial ordering property of taxonomies. We compare Box Embeddings, a natural containment representation of category taxonomies, to partial-order embeddings and a baseline Bayes Net, in the context of representing the Medical Subject Headings (MeSH) taxonomy given a set of 300K PubMed articles with subject labels from MeSH. We deeply explore the experimental properties of training box embeddings, including preparation of the training data, sampling ratios and class balance, initialization strategies, and propose a fix to the original box objective. We then present first results in using these techniques for representing a bipartite learning problem (i.e. collaborative filtering) in the presence of taxonomic relations within each partition, inferring disease (anatomical) locations from their use as subject labels in journal articles. Our box model substantially outperforms all baselines for taxonomic reconstruction and bipartite relationship experiments. This performance improvement is observed both in overall accuracy and the weighted spread by true taxonomic depth.