Li Lucy

Also published as: Lucy Li

2022

pdf abs
Discovering Differences in the Representation of People using Contextualized Semantic Axes
Li Lucy | Divya Tadimeti | David Bamman
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

A common paradigm for identifying semantic differences across social and temporal contexts is the use of static word embeddings and their distances. In particular, past work has compared embeddings against “semantic axes” that represent two opposing concepts. We extend this paradigm to BERT embeddings, and construct contextualized axes that mitigate the pitfall where antonyms have neighboring representations. We validate and demonstrate these axes on two people-centric datasets: occupations from Wikipedia, and multi-platform discussions in extremist, men’s communities over fourteen years. In both studies, contextualized semantic axes can characterize differences among instances of the same word type. In the latter study, we show that references to women and the contexts around them have become more detestable over time.

2021

pdf abs
Characterizing English Variation across Social Media Communities with BERT
Li Lucy | David Bamman
Transactions of the Association for Computational Linguistics, Volume 9

Abstract Much previous work characterizing language variation across Internet social groups has focused on the types of words used by these groups. We extend this type of study by employing BERT to characterize variation in the senses of words as well, analyzing two months of English comments in 474 Reddit communities. The specificity of different sense clusters to a community, combined with the specificity of a community’s unique word types, is used to identify cases where a social group’s language deviates from the norm. We validate our metrics using user-created glossaries and draw on sociolinguistic theories to connect language variation with trends in community behavior. We find that communities with highly distinctive language are medium-sized, and their loyal and highly engaged users interact in dense networks.

pdf abs
Gender and Representation Bias in GPT-3 Generated Stories
Li Lucy | David Bamman
Proceedings of the Third Workshop on Narrative Understanding

Using topic modeling and lexicon-based word similarity, we find that stories generated by GPT-3 exhibit many known gender stereotypes. Generated stories depict different topics and descriptions depending on GPT-3’s perceived gender of the character in a prompt, with feminine characters more likely to be associated with family and appearance, and described as less powerful than masculine characters, even when associated with high power verbs in a prompt. Our study raises questions on how one can avoid unintended social biases when using large language models for storytelling.

pdf bib
Proceedings of the Fifth Workshop on Teaching NLP
David Jurgens | Varada Kolhatkar | Lucy Li | Margot Mieskes | Ted Pedersen
Proceedings of the Fifth Workshop on Teaching NLP

2019

pdf
Using Sentiment Induction to Understand Variation in Gendered Online Communities
Li Lucy | Julia Mendelsohn
Proceedings of the Society for Computation in Linguistics (SCiL) 2019

2017

pdf abs
Are Distributional Representations Ready for the Real World? Evaluating Word Vectors for Grounded Perceptual Meaning
Li Lucy | Jon Gauthier
Proceedings of the First Workshop on Language Grounding for Robotics

Distributional word representation methods exploit word co-occurrences to build compact vector encodings of words. While these representations enjoy widespread use in modern natural language processing, it is unclear whether they accurately encode all necessary facets of conceptual meaning. In this paper, we evaluate how well these representations can predict perceptual and conceptual features of concrete concepts, drawing on two semantic norm datasets sourced from human participants. We find that several standard word representations fail to encode many salient perceptual features of concepts, and show that these deficits correlate with word-word similarity prediction errors. Our analyses provide motivation for grounded and embodied language learning approaches, which may help to remedy these deficits.

Co-authors

Varada Kolhatkar 1

Margot Mieskes 1

Ted Pedersen 1

Venues

teachingnlp1