Joonsuk Park


2021

pdf bib
Argument Mining on Twitter: A Case Study on the Planned Parenthood Debate
Muhammad Mahad Afzal Bhatti | Ahsan Suheer Ahmad | Joonsuk Park
Proceedings of the 8th Workshop on Argument Mining

Twitter is a popular platform to share opinions and claims, which may be accompanied by the underlying rationale. Such information can be invaluable to policy makers, marketers and social scientists, to name a few. However, the effort to mine arguments on Twitter has been limited, mainly because a tweet is typically too short to contain an argument — both a claim and a premise. In this paper, we propose a novel problem formulation to mine arguments from Twitter: We formulate argument mining on Twitter as a text classification task to identify tweets that serve as premises for a hashtag that represents a claim of interest. To demonstrate the efficacy of this formulation, we mine arguments for and against funding Planned Parenthood expressed in tweets. We first present a new dataset of 24,100 tweets containing hashtag #StandWithPP or #DefundPP, manually labeled as SUPPORT WITH REASON, SUPPORT WITHOUT REASON, and NO EXPLICIT SUPPORT. We then train classifiers to determine the types of tweets, achieving the best performance of 71% F1. Our results manifest claim-specific keywords as the most informative features, which in turn reveal prominent arguments for and against funding Planned Parenthood.

pdf bib
Automatic Fact-Checking with Document-level Annotations using BERT and Multiple Instance Learning
Aalok Sathe | Joonsuk Park
Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER)

Automatic fact-checking is crucial for recognizing misinformation spreading on the internet. Most existing fact-checkers break down the process into several subtasks, one of which determines candidate evidence sentences that can potentially support or refute the claim to be verified; typically, evidence sentences with gold-standard labels are needed for this. In a more realistic setting, however, such sentence-level annotations are not available. In this paper, we tackle the natural language inference (NLI) subtask—given a document and a (sentence) claim, determine whether the document supports or refutes the claim—only using document-level annotations. Using fine-tuned BERT and multiple instance learning, we achieve 81.9% accuracy, significantly outperforming the existing results on the WikiFactCheck-English dataset.

2020

pdf bib
Automated Fact-Checking of Claims from Wikipedia
Aalok Sathe | Salar Ather | Tuan Manh Le | Nathan Perry | Joonsuk Park
Proceedings of the 12th Language Resources and Evaluation Conference

Automated fact checking is becoming increasingly vital as both truthful and fallacious information accumulate online. Research on fact checking has benefited from large-scale datasets such as FEVER and SNLI. However, such datasets suffer from limited applicability due to the synthetic nature of claims and/or evidence written by annotators that differ from real claims and evidence on the internet. To this end, we present WikiFactCheck-English, a dataset of 124k+ triples consisting of a claim, context and an evidence document extracted from English Wikipedia articles and citations, as well as 34k+ manually written claims that are refuted by the evidence documents. This is the largest fact checking dataset consisting of real claims and evidence to date; it will allow the development of fact checking systems that can better process claims and evidence in the real world. We also show that for the NLI subtask, a logistic regression system trained using existing and novel features achieves peak accuracy of 68%, providing a competitive baseline for future work. Also, a decomposable attention model trained on SNLI significantly underperforms the models trained on this dataset, suggesting that models trained on manually generated data may not be sufficiently generalizable or suitable for fact checking real-world claims.

2018

pdf bib
A Corpus of eRulemaking User Comments for Measuring Evaluability of Arguments
Joonsuk Park | Claire Cardie
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Argument Mining with Structured SVMs and RNNs
Vlad Niculae | Joonsuk Park | Claire Cardie
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We propose a novel factor graph model for argument mining, designed for settings in which the argumentative relations in a document do not necessarily form a tree structure. (This is the case in over 20% of the web comments dataset we release.) Our model jointly learns elementary unit type classification and argumentative relation prediction. Moreover, our model supports SVM and RNN parametrizations, can enforce structure constraints (e.g., transitivity), and can express dependencies between adjacent relations and propositions. Our approaches outperform unstructured baselines in both web comments and argumentative essay datasets.

2016

pdf bib
A Corpus of Argument Networks: Using Graph Properties to Analyse Divisive Issues
Barbara Konat | John Lawrence | Joonsuk Park | Katarzyna Budzynska | Chris Reed
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Governments are increasingly utilising online platforms in order to engage with, and ascertain the opinions of, their citizens. Whilst policy makers could potentially benefit from such enormous feedback from society, they first face the challenge of making sense out of the large volumes of data produced. This creates a demand for tools and technologies which will enable governments to quickly and thoroughly digest the points being made and to respond accordingly. By determining the argumentative and dialogical structures contained within a debate, we are able to determine the issues which are divisive and those which attract agreement. This paper proposes a method of graph-based analytics which uses properties of graphs representing networks of arguments pro- & con- in order to automatically analyse issues which divide citizens about new regulations. By future application of the most recent advances in argument mining, the results reported here will have a chance to scale up to enable sense-making of the vast amount of feedback received from citizens on directions that policy should take.

2015

pdf bib
Conditional Random Fields for Identifying Appropriate Types of Support for Propositions in Online User Comments
Joonsuk Park | Arzoo Katiyar | Bishan Yang
Proceedings of the 2nd Workshop on Argumentation Mining

pdf bib
Automatic Identification of Rhetorical Questions
Shohini Bhattasali | Jeremy Cytryn | Elana Feldman | Joonsuk Park
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
Identifying Appropriate Support for Propositions in Online User Comments
Joonsuk Park | Claire Cardie
Proceedings of the First Workshop on Argumentation Mining

2012

pdf bib
Improving Implicit Discourse Relation Recognition Through Feature Set Optimization
Joonsuk Park | Claire Cardie
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue