Michael Lepori


2020

pdf bib
Unequal Representations: Analyzing Intersectional Biases in Word Embeddings Using Representational Similarity Analysis
Michael Lepori
Proceedings of the 28th International Conference on Computational Linguistics

We present a new approach for detecting human-like social biases in word embeddings using representational similarity analysis. Specifically, we probe contextualized and non-contextualized embeddings for evidence of intersectional biases against Black women. We show that these embeddings represent Black women as simultaneously less feminine than White women, and less Black than Black men. This finding aligns with intersectionality theory, which argues that multiple identity categories (such as race or sex) layer on top of each other in order to create unique modes of discrimination that are not shared by any individual category.

pdf bib
Picking BERT’s Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis
Michael Lepori | R. Thomas McCoy
Proceedings of the 28th International Conference on Computational Linguistics

As the name implies, contextualized representations of language are typically motivated by their ability to encode context. Which aspects of context are captured by such representations? We introduce an approach to address this question using Representational Similarity Analysis (RSA). As case studies, we investigate the degree to which a verb embedding encodes the verb’s subject, a pronoun embedding encodes the pronoun’s antecedent, and a full-sentence representation encodes the sentence’s head word (as determined by a dependency parse). In all cases, we show that BERT’s contextualized embeddings reflect the linguistic dependency being studied, and that BERT encodes these dependencies to a greater degree than it encodes less linguistically-salient controls. These results demonstrate the ability of our approach to adjudicate between hypotheses about which aspects of context are encoded in representations of language.

pdf bib
Representations of Syntax [MASK] Useful: Effects of Constituency and Dependency Structure in Recursive LSTMs
Michael Lepori | Tal Linzen | R. Thomas McCoy
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Sequence-based neural networks show significant sensitivity to syntactic structure, but they still perform less well on syntactic tasks than tree-based networks. Such tree-based networks can be provided with a constituency parse, a dependency parse, or both. We evaluate which of these two representational schemes more effectively introduces biases for syntactic structure that increase performance on the subject-verb agreement prediction task. We find that a constituency-based network generalizes more robustly than a dependency-based one, and that combining the two types of structure does not yield further improvement. Finally, we show that the syntactic robustness of sequential models can be substantially improved by fine-tuning on a small amount of constructed data, suggesting that data augmentation is a viable alternative to explicit constituency structure for imparting the syntactic biases that sequential models are lacking.