Gabriella Chronis


longhorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systematic approach to adversarial attacks.
Venelin Kovatchev | Trina Chatterjee | Venkata S Govindarajan | Jifan Chen | Eunsol Choi | Gabriella Chronis | Anubrata Das | Katrin Erk | Matthew Lease | Junyi Jessy Li | Yating Wu | Kyle Mahowald
Proceedings of the First Workshop on Dynamic Adversarial Data Collection

Developing methods to adversarially challenge NLP systems is a promising avenue for improving both model performance and interpretability. Here, we describe the approach of the team “longhorns” on Task 1 of the The First Workshop on Dynamic Adversarial Data Collection (DADC), which asked teams to manually fool a model on an Extractive Question Answering task. Our team finished first (pending validation), with a model error rate of 62%. We advocate for a systematic, linguistically informed approach to formulating adversarial questions, and we describe the results of our pilot experiments, as well as our official submission.


When is a bishop not like a rook? When it’s like a rabbi! Multi-prototype BERT embeddings for estimating semantic relationships
Gabriella Chronis | Katrin Erk
Proceedings of the 24th Conference on Computational Natural Language Learning

This paper investigates contextual language models, which produce token representations, as a resource for lexical semantics at the word or type level. We construct multi-prototype word embeddings from bert-base-uncased (Devlin et al., 2018). These embeddings retain contextual knowledge that is critical for some type-level tasks, while being less cumbersome and less subject to outlier effects than exemplar models. Similarity and relatedness estimation, both type-level tasks, benefit from this contextual knowledge, indicating the context-sensitivity of these processes. BERT’s token level knowledge also allows the testing of a type-level hypothesis about lexical abstractness, demonstrating the relationship between token-level phenomena and type-level concreteness ratings. Our findings provide important insight into the interpretability of BERT: layer 7 approximates semantic similarity, while the final layer (11) approximates relatedness.