Catherine Chen


2022

pdf
Attention weights accurately predict language representations in the brain
Mathis Lamarre | Catherine Chen | Fatma Deniz
Findings of the Association for Computational Linguistics: EMNLP 2022

In Transformer-based language models (LMs) the attention mechanism converts token embeddings into contextual embeddings that incorporate information from neighboring words. The resulting contextual hidden state embeddings have enabled highly accurate models of brain responses, suggesting that the attention mechanism constructs contextual embeddings that carry information reflected in language-related brain representations. However, it is unclear whether the attention weights that are used to integrate information across words are themselves related to language representations in the brain. To address this question we analyzed functional magnetic resonance imaging (fMRI) recordings of participants reading English language narratives. We provided the narrative text as input to two LMs (BERT and GPT-2) and extracted their corresponding attention weights. We then used encoding models to determine how well attention weights can predict recorded brain responses. We find that attention weights accurately predict brain responses in much of the frontal and temporal cortices. Our results suggest that the attention mechanism itself carries information that is reflected in brain representations. Moreover, these results indicate cortical areas in which context integration may occur.

2021

pdf
Constructing Taxonomies from Pretrained Language Models
Catherine Chen | Kevin Lin | Dan Klein
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We present a method for constructing taxonomic trees (e.g., WordNet) using pretrained language models. Our approach is composed of two modules, one that predicts parenthood relations and another that reconciles those pairwise predictions into trees. The parenthood prediction module produces likelihood scores for each potential parent-child pair, creating a graph of parent-child relation scores. The tree reconciliation module treats the task as a graph optimization problem and outputs the maximum spanning tree of this graph. We train our model on subtrees sampled from WordNet, and test on nonoverlapping WordNet subtrees. We show that incorporating web-retrieved glosses can further improve performance. On the task of constructing subtrees of English WordNet, the model achieves 66.7 ancestor F1, a 20.0% relative increase over the previous best published result on this task. In addition, we convert the original English dataset into nine other languages using Open Multilingual WordNet and extend our results across these languages.