Sarah Shugars


2021

pdf
(Mis)alignment Between Stance Expressed in Social Media Data and Public Opinion Surveys
Kenneth Joseph | Sarah Shugars | Ryan Gallagher | Jon Green | Alexi Quintana Mathé | Zijian An | David Lazer
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Stance detection, which aims to determine whether an individual is for or against a target concept, promises to uncover public opinion from large streams of social media data. Yet even human annotation of social media content does not always capture “stance” as measured by public opinion polls. We demonstrate this by directly comparing an individual’s self-reported stance to the stance inferred from their social media data. Leveraging a longitudinal public opinion survey with respondent Twitter handles, we conducted this comparison for 1,129 individuals across four salient targets. We find that recall is high for both “Pro’’ and “Anti’’ stance classifications but precision is variable in a number of cases. We identify three factors leading to the disconnect between text and author stance: temporal inconsistencies, differences in constructs, and measurement errors from both survey respondents and annotators. By presenting a framework for assessing the limitations of stance detection models, this work provides important insight into what stance detection truly measures.

2018

pdf
Microblog Conversation Recommendation via Joint Modeling of Topics and Discourse
Xingshan Zeng | Jing Li | Lu Wang | Nicholas Beauchamp | Sarah Shugars | Kam-Fai Wong
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Millions of conversations are generated every day on social media platforms. With limited attention, it is challenging for users to select which discussions they would like to participate in. Here we propose a new method for microblog conversation recommendation. While much prior work has focused on post-level recommendation, we exploit both the conversational context, and user content and behavior preferences. We propose a statistical model that jointly captures: (1) topics for representing user interests and conversation content, and (2) discourse modes for describing user replying behavior and conversation dynamics. Experimental results on two Twitter datasets demonstrate that our system outperforms methods that only model content without considering discourse.

2017

pdf
Winning on the Merits: The Joint Effects of Content and Style on Debate Outcomes
Lu Wang | Nick Beauchamp | Sarah Shugars | Kechen Qin
Transactions of the Association for Computational Linguistics, Volume 5

Debate and deliberation play essential roles in politics and government, but most models presume that debates are won mainly via superior style or agenda control. Ideally, however, debates would be won on the merits, as a function of which side has the stronger arguments. We propose a predictive model of debate that estimates the effects of linguistic features and the latent persuasive strengths of different topics, as well as the interactions between the two. Using a dataset of 118 Oxford-style debates, our model’s combination of content (as latent topics) and style (as linguistic features) allows us to predict audience-adjudicated winners with 74% accuracy, significantly outperforming linguistic features alone (66%). Our model finds that winning sides employ stronger arguments, and allows us to identify the linguistic features associated with strong or weak arguments.