Alona Fyshe


2020

pdf bib
From Language to Language-ish: How Brain-Like is an LSTM’s Representation of Nonsensical Language Stimuli?
Maryam Hashemzadeh | Greta Kaufeld | Martha White | Andrea E. Martin | Alona Fyshe
Findings of the Association for Computational Linguistics: EMNLP 2020

The representations generated by many models of language (word embeddings, recurrent neural networks and transformers) correlate to brain activity recorded while people read. However, these decoding results are usually based on the brain’s reaction to syntactically and semantically sound language stimuli. In this study, we asked: how does an LSTM (long short term memory) language model, trained (by and large) on semantically and syntactically intact language, represent a language sample with degraded semantic or syntactic information? Does the LSTM representation still resemble the brain’s reaction? We found that, even for some kinds of nonsensical language, there is a statistically significant relationship between the brain’s activity and the representations of an LSTM. This indicates that, at least in some instances, LSTMs and the human brain handle nonsensical data similarly.

2018

pdf bib
The Emergence of Semantics in Neural Network Representations of Visual Information
Dhanush Dharmaretnam | Alona Fyshe
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Word vector models learn about semantics through corpora. Convolutional Neural Networks (CNNs) can learn about semantics through images. At the most abstract level, some of the information in these models must be shared, as they model the same real-world phenomena. Here we employ techniques previously used to detect semantic representations in the human brain to detect semantic representations in CNNs. We show the accumulation of semantic information in the layers of the CNN, and discover that, for misclassified images, the correct class can be recovered in intermediate layers of a CNN.

pdf bib
Social and Emotional Correlates of Capitalization on Twitter
Sophia Chan | Alona Fyshe
Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media

Social media text is replete with unusual capitalization patterns. We posit that capitalizing a token like THIS performs two expressive functions: it marks a person socially, and marks certain parts of an utterance as more salient than others. Focusing on gender and sentiment, we illustrate using a corpus of tweets that capitalization appears in more negative than positive contexts, and is used more by females compared to males. Yet we find that both genders use capitalization in a similar way when expressing sentiment.

pdf bib
Interpreting Word-Level Hidden State Behaviour of Character-Level LSTM Language Models
Avery Hiebert | Cole Peterson | Alona Fyshe | Nishant Mehta
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

While Long Short-Term Memory networks (LSTMs) and other forms of recurrent neural network have been successfully applied to language modeling on a character level, the hidden state dynamics of these models can be difficult to interpret. We investigate the hidden states of such a model by using the HDBSCAN clustering algorithm to identify points in the text at which the hidden state is similar. Focusing on whitespace characters prior to the beginning of a word reveals interpretable clusters that offer insight into how the LSTM may combine contextual and character-level information to identify parts of speech. We also introduce a method for deriving word vectors from the hidden state representation in order to investigate the word-level knowledge of the model. These word vectors encode meaningful semantic information even for words that appear only once in the training text.

2017

pdf bib
Ensemble Methods for Native Language Identification
Sophia Chan | Maryam Honari Jahromi | Benjamin Benetti | Aazim Lakhani | Alona Fyshe
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

Our team—Uvic-NLP—explored and evaluated a variety of lexical features for Native Language Identification (NLI) within the framework of ensemble methods. Using a subset of the highest performing features, we train Support Vector Machines (SVM) and Fully Connected Neural Networks (FCNN) as base classifiers, and test different methods for combining their outputs. Restricting our scope to the closed essay track in the NLI Shared Task 2017, we find that our best SVM ensemble achieves an F1 score of 0.8730 on the test set.

2016

pdf bib
Poet Admits // Mute Cypher: Beam Search to find Mutually Enciphering Poetic Texts
Cole Peterson | Alona Fyshe
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
BrainBench: A Brain-Image Test Suite for Distributional Semantic Models
Haoyan Xu | Brian Murphy | Alona Fyshe
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

pdf bib
A Compositional and Interpretable Semantic Space
Alona Fyshe | Leila Wehbe | Partha P. Talukdar | Brian Murphy | Tom M. Mitchell
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Interpretable Semantic Vectors from a Joint Model of Brain- and Text- Based Meaning
Alona Fyshe | Partha P. Talukdar | Brian Murphy | Tom M. Mitchell
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
Documents and Dependencies: an Exploration of Vector Space Models for Semantic Composition
Alona Fyshe | Brian Murphy | Partha Talukdar | Tom Mitchell
Proceedings of the Seventeenth Conference on Computational Natural Language Learning

2006

pdf bib
Term Generalization and Synonym Resolution for Biological Abstracts: Using the Gene Ontology for Subcellular Localization Prediction
Alona Fyshe | Duane Szafron
Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology