Kristen Johnson

Also published as: Kristen Marie Johnson


2023

pdf
Race, Gender, and Age Biases in Biomedical Masked Language Models
Michelle Kim | Junghwan Kim | Kristen Johnson
Findings of the Association for Computational Linguistics: ACL 2023

Biases cause discrepancies in healthcare services.Race, gender, and age of a patient affect interactions with physicians and the medical treatments one receives.These biases in clinical practices can be amplified following the release of pre-trained language models trained on biomedical corpora.To bring awareness to such repercussions, we examine social biases present in the biomedical masked language models.We curate prompts based on evidence-based practice and compare generated diagnoses based on biases.For a case study, we measure bias in diagnosing coronary artery disease and using cardiovascular procedures based on bias.Our study demonstrates that biomedical models are less biased than BERT in gender, while the opposite is true for race and age.

2022

pdf
CLoSE: Contrastive Learning of Subframe Embeddings for Political Bias Classification of News Media
Michelle YoungJin Kim | Kristen Marie Johnson
Proceedings of the 29th International Conference on Computational Linguistics

Framing is a political strategy in which journalists and politicians emphasize certain aspects of a societal issue in order to influence and sway public opinion. Frameworks for detecting framing in news articles or social media posts are critical in understanding the spread of biased information in our society. In this paper, we propose CLoSE, a multi-task BERT-based model which uses contrastive learning to embed indicators of frames from news articles in order to predict political bias. We evaluate the performance of our proposed model on subframes and political bias classification tasks. We also demonstrate the model’s classification accuracy on zero-shot and few-shot learning tasks, providing a promising avenue for framing detection in unlabeled data.

2021

pdf
Cryptocurrency Day Trading and Framing Prediction in Microblog Discourse
Anna Paula Pawlicka Maule | Kristen Johnson
Proceedings of the Third Workshop on Economics and Natural Language Processing

With 56 million people actively trading and investing in cryptocurrency online and globally in 2020, there is an increasing need for automatic social media analysis tools to help understand trading discourse and behavior. In this work, we present a dual natural language modeling pipeline which leverages language and social network behaviors for the prediction of cryptocurrency day trading actions and their associated framing patterns. This pipeline first predicts if tweets can be used to guide day trading behavior, specifically if a cryptocurrency investor should buy, sell, or hold their cryptocurrencies in order to make a profit. Next, tweets are input to an unsupervised deep clustering approach to automatically detect trading framing patterns. Our contributions include the modeling pipeline for this novel task, a new Cryptocurrency Tweets Dataset compiled from influential accounts, and a Historical Price Dataset. Our experiments show that our approach achieves an 88.78% accuracy for day trading behavior prediction and reveals framing fluctuations prior to and during the COVID-19 pandemic that could be used to guide investment actions.

2020


Using Social Media For Bitcoin Day Trading Behavior Prediction
Anna Paula Pawlicka Maule | Kristen Johnson
Proceedings of the The Fourth Widening Natural Language Processing Workshop

This abstract presents preliminary work in the application of natural language processing techniques and social network modeling for the prediction of cryptocurrency trading and investment behavior. Specifically, we are building models to use language and social network behaviors to predict if the tweets of a 24-hour period can be used to buy or sell cryptocurrency to make a profit. In this paper we present our novel task and initial language modeling studies.

2019

pdf
Modeling Behavioral Aspects of Social Media Discourse for Moral Classification
Kristen Johnson | Dan Goldwasser
Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science

Political discourse on social media microblogs, specifically Twitter, has become an undeniable part of mainstream U.S. politics. Given the length constraint of tweets, politicians must carefully word their statements to ensure their message is understood by their intended audience. This constraint often eliminates the context of the tweet, making automatic analysis of social media political discourse a difficult task. To overcome this challenge, we propose simultaneous modeling of high-level abstractions of political language, such as political slogans and framing strategies, with abstractions of how politicians behave on Twitter. These behavioral abstractions can be further leveraged as forms of supervision in order to increase prediction accuracy, while reducing the burden of annotation. In this work, we use Probabilistic Soft Logic (PSL) to build relational models to capture the similarities in language and behavior that obfuscate political messages on Twitter. When combined, these descriptors reveal the moral foundations underlying the discourse of U.S. politicians online, across differing governing administrations, showing how party talking points remain cohesive or change over time.

2018

pdf
Classification of Moral Foundations in Microblog Political Discourse
Kristen Johnson | Dan Goldwasser
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Previous works in computer science, as well as political and social science, have shown correlation in text between political ideologies and the moral foundations expressed within that text. Additional work has shown that policy frames, which are used by politicians to bias the public towards their stance on an issue, are also correlated with political ideology. Based on these associations, this work takes a first step towards modeling both the language and how politicians frame issues on Twitter, in order to predict the moral foundations that are used by politicians to express their stances on issues. The contributions of this work includes a dataset annotated for the moral foundations, annotation guidelines, and probabilistic graphical models which show the usefulness of jointly modeling abstract political slogans, as opposed to the unigrams of previous works, with policy frames for the prediction of the morality underlying political tweets.

2017

pdf
Leveraging Behavioral and Social Information for Weakly Supervised Collective Classification of Political Discourse on Twitter
Kristen Johnson | Di Jin | Dan Goldwasser
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Framing is a political strategy in which politicians carefully word their statements in order to control public perception of issues. Previous works exploring political framing typically analyze frame usage in longer texts, such as congressional speeches. We present a collection of weakly supervised models which harness collective classification to predict the frames used in political discourse on the microblogging platform, Twitter. Our global probabilistic models show that by combining both lexical features of tweets and network-based behavioral features of Twitter, we are able to increase the average, unsupervised F1 score by 21.52 points over a lexical baseline alone.

pdf
PurdueNLP at SemEval-2017 Task 1: Predicting Semantic Textual Similarity with Paraphrase and Event Embeddings
I-Ta Lee | Mahak Goindani | Chang Li | Di Jin | Kristen Marie Johnson | Xiao Zhang | Maria Leonor Pacheco | Dan Goldwasser
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes our proposed solution for SemEval 2017 Task 1: Semantic Textual Similarity (Daniel Cer and Specia, 2017). The task aims at measuring the degree of equivalence between sentences given in English. Performance is evaluated by computing Pearson Correlation scores between the predicted scores and human judgements. Our proposed system consists of two subsystems and one regression model for predicting STS scores. The two subsystems are designed to learn Paraphrase and Event Embeddings that can take the consideration of paraphrasing characteristics and sentence structures into our system. The regression model associates these embeddings to make the final predictions. The experimental result shows that our system acquires 0.8 of Pearson Correlation Scores in this task.

pdf
Ideological Phrase Indicators for Classification of Political Discourse Framing on Twitter
Kristen Johnson | I-Ta Lee | Dan Goldwasser
Proceedings of the Second Workshop on NLP and Computational Social Science

Politicians carefully word their statements in order to influence how others view an issue, a political strategy called framing. Simultaneously, these frames may also reveal the beliefs or positions on an issue of the politician. Simple language features such as unigrams, bigrams, and trigrams are important indicators for identifying the general frame of a text, for both longer congressional speeches and shorter tweets of politicians. However, tweets may contain multiple unigrams across different frames which limits the effectiveness of this approach. In this paper, we present a joint model which uses both linguistic features of tweets and ideological phrase indicators extracted from a state-of-the-art embedding-based model to predict the general frame of political tweets.

2016

pdf
Identifying Stance by Analyzing Political Discourse on Twitter
Kristen Johnson | Dan Goldwasser
Proceedings of the First Workshop on NLP and Computational Social Science

pdf
“All I know about politics is what I read in Twitter”: Weakly Supervised Models for Extracting Politicians’ Stances From Twitter
Kristen Johnson | Dan Goldwasser
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

During the 2016 United States presidential election, politicians have increasingly used Twitter to express their beliefs, stances on current political issues, and reactions concerning national and international events. Given the limited length of tweets and the scrutiny politicians face for what they choose or neglect to say, they must craft and time their tweets carefully. The content and delivery of these tweets is therefore highly indicative of a politician’s stances. We present a weakly supervised method for extracting how issues are framed and temporal activity patterns on Twitter for popular politicians and issues of the 2016 election. These behavioral components are combined into a global model which collectively infers the most likely stance and agreement patterns among politicians, with respective accuracies of 86.44% and 84.6% on average.