Peter Potash


2023

pdf
Prompt Discriminative Language Models for Domain Adaptation
Keming Lu | Peter Potash | Xihui Lin | Yuwen Sun | Zihan Qian | Zheng Yuan | Tristan Naumann | Tianxi Cai | Junwei Lu
Proceedings of the 5th Clinical Natural Language Processing Workshop

Prompt tuning offers an efficient approach to domain adaptation for pretrained language models, which predominantly focus on masked language modeling or generative objectives. However, the potential of discriminative language models in biomedical tasks remains underexplored.To bridge this gap, we develop BioDLM, a method tailored for biomedical domain adaptation of discriminative language models that incorporates prompt-based continual pretraining and prompt tuning for downstream tasks. BioDLM aims to maximize the potential of discriminative language models in low-resource scenarios by reformulating these tasks as span-level corruption detection, thereby enhancing performance on domain-specific tasks and improving the efficiency of continual pertaining. In this way, BioDLM provides a data-efficient domain adaptation method for discriminative language models, effectively enhancing performance on discriminative tasks within the biomedical domain.

2019

pdf
Ranking Passages for Argument Convincingness
Peter Potash | Adam Ferguson | Timothy J. Hazen
Proceedings of the 6th Workshop on Argument Mining

In data ranking applications, pairwise annotation is often more consistent than cardinal annotation for learning ranking models. We examine this in a case study on ranking text passages for argument convincingness. Our task is to choose text passages that provide the highest-quality, most-convincing arguments for opposing sides of a topic. Using data from a deployed system within the Bing search engine, we construct a pairwise-labeled dataset for argument convincingness that is substantially more comprehensive in topical coverage compared to existing public resources. We detail the process of extracting topical passages for queries submitted to a search engine, creating annotated sets of passages aligned to different stances on a topic, and assessing argument convincingness of passages using pairwise annotation. Using a state-of-the-art convincingness model, we evaluate several methods for using pairwise-annotated data examples to train models for ranking passages. Our results show pairwise training outperforms training that regresses to a target score for each passage. Our results also show a simple ‘win-rate’ score is a better regression target than the previously proposed page-rank target. Lastly, addressing the need to filter noisy crowd-sourced annotations when constructing a dataset, we show that filtering for transitivity within pairwise annotations is more effective than filtering based on annotation confidence measures for individual examples.

2018

pdf
Evaluating Creative Language Generation: The Case of Rap Lyric Ghostwriting
Peter Potash | Alexey Romanov | Anna Rumshisky
Proceedings of the Second Workshop on Stylistic Variation

Language generation tasks that seek to mimic human ability to use language creatively are difficult to evaluate, since one must consider creativity, style, and other non-trivial aspects of the generated text. The goal of this paper is to develop evaluations methods for one such task, ghostwriting of rap lyrics, and to provide an explicit, quantifiable foundation for the goals and future directions for this task. Ghostwriting must produce text that is similar in style to the emulated artist, yet distinct in content. We develop a novel evaluation methodology that addresses several complementary aspects of this task, and illustrate how such evaluation can be used to meaning fully analyze system performance. We provide a corpus of lyrics for 13 rap artists, annotated for stylistic similarity, which allows us to assess the feasibility of manual evaluation for generated verse.

2017

pdf
Tracking Bias in News Sources Using Social Media: the Russia-Ukraine Maidan Crisis of 2013–2014
Peter Potash | Alexey Romanov | Mikhail Gronas | Anna Rumshisky | Mikhail Gronas
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism

This paper addresses the task of identifying the bias in news articles published during a political or social conflict. We create a silver-standard corpus based on the actions of users in social media. Specifically, we reconceptualize bias in terms of how likely a given article is to be shared or liked by each of the opposing sides. We apply our methodology to a dataset of links collected in relation to the Russia-Ukraine Maidan crisis from 2013-2014. We show that on the task of predicting which side is likely to prefer a given article, a Naive Bayes classifier can record 90.3% accuracy looking only at domain names of the news sources. The best accuracy of 93.5% is achieved by a feed forward neural network. We also apply our methodology to gold-labeled set of articles annotated for bias, where the aforementioned Naive Bayes classifier records 82.6% accuracy and a feed-forward neural networks records 85.6% accuracy.

pdf
Here’s My Point: Joint Pointer Architecture for Argument Mining
Peter Potash | Alexey Romanov | Anna Rumshisky
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In order to determine argument structure in text, one must understand how individual components of the overall argument are linked. This work presents the first neural network-based approach to link extraction in argument mining. Specifically, we propose a novel architecture that applies Pointer Network sequence-to-sequence attention modeling to structural prediction in discourse parsing tasks. We then develop a joint model that extends this architecture to simultaneously address the link extraction task and the classification of argument components. The proposed joint model achieves state-of-the-art results on two separate evaluation corpora, showing far superior performance than the previously proposed corpus-specific and heavily feature-engineered models. Furthermore, our results demonstrate that jointly optimizing for both tasks is crucial for high performance.

pdf
Towards Debate Automation: a Recurrent Model for Predicting Debate Winners
Peter Potash | Anna Rumshisky
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper we introduce a practical first step towards the creation of an automated debate agent: a state-of-the-art recurrent predictive model for predicting debate winners. By having an accurate predictive model, we are able to objectively rate the quality of a statement made at a specific turn in a debate. The model is based on a recurrent neural network architecture with attention, which allows the model to effectively account for the entire debate when making its prediction. Our model achieves state-of-the-art accuracy on a dataset of debate transcripts annotated with audience favorability of the debate teams. Finally, we discuss how future work can leverage our proposed model for the creation of an automated debate agent. We accomplish this by determining the model input that will maximize audience favorability toward a given side of a debate at an arbitrary turn.

pdf
Length, Interchangeability, and External Knowledge: Observations from Predicting Argument Convincingness
Peter Potash | Robin Bhattacharya | Anna Rumshisky
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this work, we provide insight into three key aspects related to predicting argument convincingness. First, we explicitly display the power that text length possesses for predicting convincingness in an unsupervised setting. Second, we show that a bag-of-words embedding model posts state-of-the-art on a dataset of arguments annotated for convincingness, outperforming an SVM with numerous hand-crafted features as well as recurrent neural network models that attempt to capture semantic composition. Finally, we assess the feasibility of integrating external knowledge when predicting convincingness, as arguments are often more convincing when they contain abundant information and facts. We finish by analyzing the correlations between the various models we propose.

pdf
SemEval-2017 Task 6: #HashtagWars: Learning a Sense of Humor
Peter Potash | Alexey Romanov | Anna Rumshisky
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes a new shared task for humor understanding that attempts to eschew the ubiquitous binary approach to humor detection and focus on comparative humor ranking instead. The task is based on a new dataset of funny tweets posted in response to shared hashtags, collected from the ‘Hashtag Wars’ segment of the TV show @midnight. The results are evaluated in two subtasks that require the participants to generate either the correct pairwise comparisons of tweets (subtask A), or the correct ranking of the tweets (subtask B) in terms of how funny they are. 7 teams participated in subtask A, and 5 teams participated in subtask B. The best accuracy in subtask A was 0.675. The best (lowest) rank edit distance for subtask B was 0.872.

2016

pdf
SimiHawk at SemEval-2016 Task 1: A Deep Ensemble System for Semantic Textual Similarity
Peter Potash | William Boag | Alexey Romanov | Vasili Ramanishka | Anna Rumshisky
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf
GhostWriter: Using an LSTM for Automatic Rap Lyric Generation
Peter Potash | Alexey Romanov | Anna Rumshisky
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf
TwitterHawk: A Feature Bucket Based Approach to Sentiment Analysis
William Boag | Peter Potash | Anna Rumshisky
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)