Smaranda Muresan

2021

pdf bib abs
“Laughing at you or with you”: The Role of Sarcasm in Shaping the Disagreement Space
Debanjan Ghosh | Ritvik Shrivastava | Smaranda Muresan
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Detecting arguments in online interactions is useful to understand how conflicts arise and get resolved. Users often use figurative language, such as sarcasm, either as persuasive devices or to attack the opponent by an ad hominem argument. To further our understanding of the role of sarcasm in shaping the disagreement space, we present a thorough experimental setup using a corpus annotated with both argumentative moves (agree/disagree) and sarcasm. We exploit joint modeling in terms of (a) applying discrete features that are useful in detecting sarcasm to the task of argumentative relation classification (agree/disagree/none), and (b) multitask learning for argumentative relation classification and sarcasm detection using deep learning architectures (e.g., dual Long Short-Term Memory (LSTM) with hierarchical attention and Transformer-based architectures). We demonstrate that modeling sarcasm improves the argumentative relation classification task (agree/disagree/none) in all setups.

pdf bib
Figurative Language in Recognizing Textual Entailment
Tuhin Chakrabarty | Debanjan Ghosh | Adam Poliak | Smaranda Muresan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Multi-Task Learning and Adapted Knowledge Models for Emotion-Cause Extraction
Elsbeth Turcan | Shuai Wang | Rishita Anubhai | Kasturi Bhattacharjee | Yaser Al-Onaizan | Smaranda Muresan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib abs
What to Fact-Check: Guiding Check-Worthy Information Detection in News Articles through Argumentative Discourse Structure
Tariq Alhindi | Brennan McManus | Smaranda Muresan
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Most existing methods for automatic fact-checking start with a precompiled list of claims to verify. We investigate the understudied problem of determining what statements in news articles are worthy to fact-check. We annotate the argument structure of 95 news articles in the climate change domain that are fact-checked by climate scientists at climatefeedback.org. We release the first multi-layer annotated corpus for both argumentative discourse structure (argument types and relations) and for fact-checked statements in news articles. We discuss the connection between argument structure and check-worthy statements and develop several baseline models for detecting check-worthy statements in the climate change domain. Our preliminary results show that using information about argumentative discourse structure shows slight but statistically significant improvement over a baseline of local discourse structure.

pdf bib abs
Emotion-Infused Models for Explainable Psychological Stress Detection
Elsbeth Turcan | Smaranda Muresan | Kathleen McKeown
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The problem of detecting psychological stress in online posts, and more broadly, of detecting people in distress or in need of help, is a sensitive application for which the ability to interpret models is vital. Here, we present work exploring the use of a semantically related task, emotion detection, for equally competent but more explainable and human-like psychological stress detection as compared to a black-box model. In particular, we explore the use of multi-task learning as well as emotion-based language model fine-tuning. With our emotion-infused models, we see comparable results to state-of-the-art BERT. Our analysis of the words used for prediction show that our emotion-infused models mirror psychological components of stress.

pdf bib abs
MERMAID: Metaphor Generation with Symbolism and Discriminative Decoding
Tuhin Chakrabarty | Xurui Zhang | Smaranda Muresan | Nanyun Peng
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Generating metaphors is a challenging task as it requires a proper understanding of abstract concepts, making connections between unrelated concepts, and deviating from the literal meaning. In this paper, we aim to generate a metaphoric sentence given a literal expression by replacing relevant verbs. Based on a theoretically-grounded connection between metaphors and symbols, we propose a method to automatically construct a parallel corpus by transforming a large number of metaphorical sentences from the Gutenberg Poetry corpus (CITATION) to their literal counterpart using recent advances in masked language modeling coupled with commonsense inference. For the generation task, we incorporate a metaphor discriminator to guide the decoding of a sequence to sequence model fine-tuned on our parallel data to generate high-quality metaphors. Human evaluation on an independent test set of literal statements shows that our best model generates metaphors better than three well-crafted baselines 66% of the time on average. A task-based evaluation shows that human-written poems enhanced with metaphors proposed by our model are preferred 68% of the time compared to poems without metaphors.

pdf bib abs
ENTRUST: Argument Reframing with Language Models and Entailment
Tuhin Chakrabarty | Christopher Hidey | Smaranda Muresan
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Framing involves the positive or negative presentation of an argument or issue depending on the audience and goal of the speaker. Differences in lexical framing, the focus of our work, can have large effects on peoples’ opinions and beliefs. To make progress towards reframing arguments for positive effects, we create a dataset and method for this task. We use a lexical resource for “connotations” to create a parallel corpus and propose a method for argument reframing that combines controllable text generation (positive connotation) with a post-decoding entailment component (same denotation). Our results show that our method is effective compared to strong baselines along the dimensions of fluency, meaning, and trustworthiness/reduction of fear.

pdf bib
Domain and Task-Informed Sample Selection for Cross-Domain Target-based Sentiment Analysis
Kasturi Bhattacharjee | Rashmi Gangadharaiah | Smaranda Muresan
Proceedings of The Fourth International Conference on Natural Language and Speech Processing (ICNLSP 2021)

pdf bib abs
COVID-Fact: Fact Extraction and Verification of Real-World Claims on COVID-19 Pandemic
Arkadiy Saakyan | Tuhin Chakrabarty | Smaranda Muresan
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We introduce a FEVER-like dataset COVID-Fact of 4,086 claims concerning the COVID-19 pandemic. The dataset contains claims, evidence for the claims, and contradictory claims refuted by the evidence. Unlike previous approaches, we automatically detect true claims and their source articles and then generate counter-claims using automatic methods rather than employing human annotators. Along with our constructed resource, we formally present the task of identifying relevant evidence for the claims and verifying whether the evidence refutes or supports a given claim. In addition to scientific claims, our data contains simplified general claims from media sources, making it better suited for detecting general misinformation regarding COVID-19. Our experiments indicate that COVID-Fact will provide a challenging testbed for the development of new systems and our approach will reduce the costs of building domain-specific datasets for detecting misinformation.

pdf bib abs
Metaphor Generation with Conceptual Mappings
Kevin Stowe | Tuhin Chakrabarty | Nanyun Peng | Smaranda Muresan | Iryna Gurevych
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Generating metaphors is a difficult task as it requires understanding nuanced relationships between abstract concepts. In this paper, we aim to generate a metaphoric sentence given a literal expression by replacing relevant verbs. Guided by conceptual metaphor theory, we propose to control the generation process by encoding conceptual mappings between cognitive domains to generate meaningful metaphoric expressions. To achieve this, we develop two methods: 1) using FrameNet-based embeddings to learn mappings between domains and applying them at the lexical level (CM-Lex), and 2) deriving source/target pairs to train a controlled seq-to-seq generation model (CM-BART). We assess our methods through automatic and human evaluation for basic metaphoricity and conceptual metaphor presence. We show that the unsupervised CM-Lex model is competitive with recent deep learning metaphor generation systems, and CM-BART outperforms all other models both in automatic and human evaluations.

pdf bib abs
Weakly-Supervised Methods for Suicide Risk Assessment: Role of Related Domains
Chenghao Yang | Yudong Zhang | Smaranda Muresan
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Social media has become a valuable resource for the study of suicidal ideation and the assessment of suicide risk. Among social media platforms, Reddit has emerged as the most promising one due to its anonymity and its focus on topic-based communities (subreddits) that can be indicative of someone’s state of mind or interest regarding mental health disorders such as r/SuicideWatch, r/Anxiety, r/depression. A challenge for previous work on suicide risk assessment has been the small amount of labeled data. We propose an empirical investigation into several classes of weakly-supervised approaches, and show that using pseudo-labeling based on related issues around mental health (e.g., anxiety, depression) helps improve model performance for suicide risk assessment.

pdf bib abs
Implicit Premise Generation with Discourse-aware Commonsense Knowledge Models
Tuhin Chakrabarty | Aadit Trivedi | Smaranda Muresan
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Enthymemes are defined as arguments where a premise or conclusion is left implicit. We tackle the task of generating the implicit premise in an enthymeme, which requires not only an understanding of the stated conclusion and premise but also additional inferences that could depend on commonsense knowledge. The largest available dataset for enthymemes (Habernal et al., 2018) consists of 1.7k samples, which is not large enough to train a neural text generation model. To address this issue, we take advantage of a similar task and dataset: Abductive reasoning in narrative text (Bhagavatula et al., 2020). However, we show that simply using a state-of-the-art seq2seq model fine-tuned on this data might not generate meaningful implicit premises associated with the given enthymemes. We demonstrate that encoding discourse-aware commonsense during fine-tuning improves the quality of the generated implicit premises and outperforms all other baselines both in automatic and human evaluations on three different datasets.

pdf bib abs
Don’t Go Far Off: An Empirical Study on Neural Poetry Translation
Tuhin Chakrabarty | Arkadiy Saakyan | Smaranda Muresan
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Despite constant improvements in machine translation quality, automatic poetry translation remains a challenging problem due to the lack of open-sourced parallel poetic corpora, and to the intrinsic complexities involved in preserving the semantics, style and figurative nature of poetry. We present an empirical investigation for poetry translation along several dimensions: 1) size and style of training data (poetic vs. non-poetic), including a zero-shot setup; 2) bilingual vs. multilingual learning; and 3) language-family-specific models vs. mixed-language-family models. To accomplish this, we contribute a parallel dataset of poetry translations for several language pairs. Our results show that multilingual fine-tuning on poetic text significantly outperforms multilingual fine-tuning on non-poetic text that is 35X larger in size, both in terms of automatic metrics (BLEU, BERTScore, COMET) and human evaluation metrics such as faithfulness (meaning and poetic style). Moreover, multilingual fine-tuning on poetic data outperforms bilingual fine-tuning on poetic data.

2020

bib abs
An Evaluation of Subword Segmentation Strategies for Neural Machine Translation of Morphologically Rich Languages
Aquia Richburg | Ramy Eskander | Smaranda Muresan | Marine Carpuat
Proceedings of the The Fourth Widening Natural Language Processing Workshop

Byte-Pair Encoding (BPE) (Sennrich et al., 2016) has become a standard pre-processing step when building neural machine translation systems. However, it is not clear whether this is an optimal strategy in all settings. We conduct a controlled comparison of subword segmentation strategies for translating two low-resource morphologically rich languages (Swahili and Turkish) into English. We show that segmentations based on a unigram language model (Kudo, 2018) yield comparable BLEU and better recall for translating rare source words than BPE.

pdf bib abs
Unsupervised Cross-Lingual Part-of-Speech Tagging for Truly Low-Resource Scenarios
Ramy Eskander | Smaranda Muresan | Michael Collins
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We describe a fully unsupervised cross-lingual transfer approach for part-of-speech (POS) tagging under a truly low resource scenario. We assume access to parallel translations between the target language and one or more source languages for which POS taggers are available. We use the Bible as parallel data in our experiments: small size, out-of-domain and covering many diverse languages. Our approach innovates in three ways: 1) a robust approach of selecting training instances via cross-lingual annotation projection that exploits best practices of unsupervised type and token constraints, word-alignment confidence and density of projected POS, 2) a Bi-LSTM architecture that uses contextualized word embeddings, affix embeddings and hierarchical Brown clusters, and 3) an evaluation on 12 diverse languages in terms of language family and morphological typology. In spite of the use of limited and out-of-domain parallel data, our experiments demonstrate significant improvements in accuracy over previous work. In addition, we show that using multi-source information, either via projection or output combination, improves the performance for most target languages.

pdf bib abs
Generating similes effortlessly like a Pro: A Style Transfer Approach for Simile Generation
Tuhin Chakrabarty | Smaranda Muresan | Nanyun Peng
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Literary tropes, from poetry to stories, are at the crux of human imagination and communication. Figurative language such as a simile go beyond plain expressions to give readers new insights and inspirations. In this paper, we tackle the problem of simile generation. Generating a simile requires proper understanding for effective mapping of properties between two concepts. To this end, we first propose a method to automatically construct a parallel corpus by transforming a large number of similes collected from Reddit to their literal counterpart using structured common sense knowledge. We then propose to fine-tune a pre-trained sequence to sequence model, BART (Lewis et al 2019), on the literal-simile pairs to gain generalizability, so that we can generate novel similes given a literal sentence. Experiments show that our approach generates 88% novel similes that do not share properties with the training data. Human evaluation on an independent set of literal statements shows that our model generates similes better than two literary experts 37% of the time when compared pairwise. We also show how replacing literal sentences with similes from our best model in machine-generated stories improves evocativeness and leads to better acceptance by human judges.

pdf bib abs
To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging
Kasturi Bhattacharjee | Miguel Ballesteros | Rishita Anubhai | Smaranda Muresan | Jie Ma | Faisal Ladhak | Yaser Al-Onaizan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Leveraging large amounts of unlabeled data using Transformer-like architectures, like BERT, has gained popularity in recent times owing to their effectiveness in learning general representations that can then be further fine-tuned for downstream tasks to much success. However, training these models can be costly both from an economic and environmental standpoint. In this work, we investigate how to effectively use unlabeled data: by exploring the task-specific semi-supervised approach, Cross-View Training (CVT) and comparing it with task-agnostic BERT in multiple settings that include domain and task relevant English data. CVT uses a much lighter model architecture and we show that it achieves similar performance to BERT on a set of sequence tagging tasks, with lesser financial and environmental impact.

pdf bib abs
MultiSeg: Parallel Data and Subword Information for Learning Bilingual Embeddings in Low Resource Scenarios
Efsun Sarioglu Kayi | Vishal Anand | Smaranda Muresan
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

Distributed word embeddings have become ubiquitous in natural language processing as they have been shown to improve performance in many semantic and syntactic tasks. Popular models for learning cross-lingual word embeddings do not consider the morphology of words. We propose an approach to learn bilingual embeddings using parallel data and subword information that is expressed in various forms, i.e. character n-grams, morphemes obtained by unsupervised morphological segmentation and byte pair encoding. We report results for three low resource morphologically rich languages (Swahili, Tagalog, and Somali) and a high resource language (German) in a simulated a low-resource scenario. Our results show that our method that leverages subword information outperforms the model without subword information, both in intrinsic and extrinsic evaluations of the learned embeddings. Specifically, analogy reasoning results show that using subwords helps capture syntactic characteristics. Semantically, word similarity results and intrinsically, word translation scores demonstrate superior performance over existing methods. Finally, qualitative analysis also shows better-quality cross-lingual embeddings particularly for morphological variants in both languages.

pdf bib abs
Fact vs. Opinion: the Role of Argumentation Features in News Classification
Tariq Alhindi | Smaranda Muresan | Daniel Preotiuc-Pietro
Proceedings of the 28th International Conference on Computational Linguistics

A 2018 study led by the Media Insight Project showed that most journalists think that a clearmarking of what is news reporting and what is commentary or opinion (e.g., editorial, op-ed)is essential for gaining public trust. We present an approach to classify news articles into newsstories (i.e., reporting of factual information) and opinion pieces using models that aim to sup-plement the article content representation with argumentation features. Our hypothesis is thatthe nature of argumentative discourse is important in distinguishing between news stories andopinion articles. We show that argumentation features outperform linguistic features used previ-ously and improve on fine-tuned transformer-based models when tested on data from publishersunseen in training. Automatically flagging opinion pieces vs. news stories can aid applicationssuch as fact-checking or event extraction.

pdf bib
Interpreting Verbal Irony: Linguistic Strategies and the Connection to theType of Semantic Incongruity
Debanjan Ghosh | Elena Musi | Smaranda Muresan
Proceedings of the Society for Computation in Linguistics 2020

pdf bib abs
MorphAGram, Evaluation and Framework for Unsupervised Morphological Segmentation
Ramy Eskander | Francesca Callejas | Elizabeth Nichols | Judith Klavans | Smaranda Muresan
Proceedings of the 12th Language Resources and Evaluation Conference

Computational morphological segmentation has been an active research topic for decades as it is beneficial for many natural language processing tasks. With the high cost of manually labeling data for morphology and the increasing interest in low-resource languages, unsupervised morphological segmentation has become essential for processing a typologically diverse set of languages, whether high-resource or low-resource. In this paper, we present and release MorphAGram, a publicly available framework for unsupervised morphological segmentation that uses Adaptor Grammars (AG) and is based on the work presented by Eskander et al. (2016). We conduct an extensive quantitative and qualitative evaluation of this framework on 12 languages and show that the framework achieves state-of-the-art results across languages of different typologies (from fusional to polysynthetic and from high-resource to low-resource).

pdf bib abs
Rˆ3: Reverse, Retrieve, and Rank for Sarcasm Generation with Commonsense Knowledge
Tuhin Chakrabarty | Debanjan Ghosh | Smaranda Muresan | Nanyun Peng
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose an unsupervised approach for sarcasm generation based on a non-sarcastic input sentence. Our method employs a retrieve-and-edit framework to instantiate two major characteristics of sarcasm: reversal of valence and semantic incongruity with the context, which could include shared commonsense or world knowledge between the speaker and the listener. While prior works on sarcasm generation predominantly focus on context incongruity, we show that combining valence reversal and semantic incongruity based on the commonsense knowledge generates sarcasm of higher quality. Human evaluation shows that our system generates sarcasm better than humans 34% of the time, and better than a reinforced hybrid baseline 90% of the time.

pdf bib abs
DeSePtion: Dual Sequence Prediction and Adversarial Examples for Improved Fact-Checking
Christopher Hidey | Tuhin Chakrabarty | Tariq Alhindi | Siddharth Varia | Kriste Krstovski | Mona Diab | Smaranda Muresan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The increased focus on misinformation has spurred development of data and systems for detecting the veracity of a claim as well as retrieving authoritative evidence. The Fact Extraction and VERification (FEVER) dataset provides such a resource for evaluating endto- end fact-checking, requiring retrieval of evidence from Wikipedia to validate a veracity prediction. We show that current systems for FEVER are vulnerable to three categories of realistic challenges for fact-checking – multiple propositions, temporal reasoning, and ambiguity and lexical variation – and introduce a resource with these types of claims. Then we present a system designed to be resilient to these “attacks” using multiple pointer networks for document selection and jointly modeling a sequence of evidence sentences and veracity relation predictions. We find that in handling these attacks we obtain state-of-the-art results on FEVER, largely due to improved evidence retrieval.

pdf bib abs
A Report on the 2020 Sarcasm Detection Shared Task
Debanjan Ghosh | Avijit Vajpayee | Smaranda Muresan
Proceedings of the Second Workshop on Figurative Language Processing

Detecting sarcasm and verbal irony is critical for understanding people’s actual sentiments and beliefs. Thus, the field of sarcasm analysis has become a popular research problem in natural language processing. As the community working on computational approaches for sarcasm detection is growing, it is imperative to conduct benchmarking studies to analyze the current state-of-the-art, facilitating progress in this area. We report on the shared task on sarcasm detection we conducted as a part of the 2nd Workshop on Figurative Language Processing (FigLang 2020) at ACL 2020.

2019

pdf bib abs
Pay “Attention” to your Context when Classifying Abusive Language
Tuhin Chakrabarty | Kilol Gupta | Smaranda Muresan
Proceedings of the Third Workshop on Abusive Language Online

The goal of any social media platform is to facilitate healthy and meaningful interactions among its users. But more often than not, it has been found that it becomes an avenue for wanton attacks. We propose an experimental study that has three aims: 1) to provide us with a deeper understanding of current data sets that focus on different types of abusive language, which are sometimes overlapping (racism, sexism, hate speech, offensive language, and personal attacks); 2) to investigate what type of attention mechanism (contextual vs. self-attention) is better for abusive language detection using deep learning architectures; and 3) to investigate whether stacked architectures provide an advantage over simple architectures for this task.

pdf bib abs
Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages
Ramy Eskander | Judith Klavans | Smaranda Muresan
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology

Polysynthetic languages pose a challenge for morphological analysis due to the root-morpheme complexity and to the word class “squish”. In addition, many of these polysynthetic languages are low-resource. We propose unsupervised approaches for morphological segmentation of low-resource polysynthetic languages based on Adaptor Grammars (AG) (Eskander et al., 2016). We experiment with four languages from the Uto-Aztecan family. Our AG-based approaches outperform other unsupervised approaches and show promise when compared to supervised methods, outperforming them on two of the four languages.

pdf bib abs
Rubric Reliability and Annotation of Content and Argument in Source-Based Argument Essays
Yanjun Gao | Alex Driban | Brennan Xavier McManus | Elena Musi | Patricia Davies | Smaranda Muresan | Rebecca J. Passonneau
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

We present a unique dataset of student source-based argument essays to facilitate research on the relations between content, argumentation skills, and assessment. Two classroom writing assignments were given to college students in a STEM major, accompanied by a carefully designed rubric. The paper presents a reliability study of the rubric, showing it to be highly reliable, and initial annotation on content and argumentation annotation of the essays.

pdf bib abs
Columbia at SemEval-2019 Task 7: Multi-task Learning for Stance Classification and Rumour Verification
Zhuoran Liu | Shivali Goel | Mukund Yelahanka Raghuprasad | Smaranda Muresan
Proceedings of the 13th International Workshop on Semantic Evaluation

The paper presents Columbia team’s participation in the SemEval 2019 Shared Task 7: RumourEval 2019. Detecting rumour on social networks has been a focus of research in recent years. Previous work suffered from data sparsity, which potentially limited the application of more sophisticated neural architecture to this task. We mitigate this problem by proposing a multi-task learning approach together with language model fine-tuning. Our attention-based model allows different tasks to leverage different level of information. Our system ranked 6th overall with an F1-score of 36.25 on stance classification and F1 of 22.44 on rumour verification.

pdf bib abs
ColumbiaNLP at SemEval-2019 Task 8: The Answer is Language Model Fine-tuning
Tuhin Chakrabarty | Smaranda Muresan
Proceedings of the 13th International Workshop on Semantic Evaluation

Community Question Answering forums are very popular nowadays, as they represent effective means for communities to share information around particular topics. But the information shared on these forums are often not authentic. This paper presents the ColumbiaNLP submission for the SemEval-2019 Task 8: Fact-Checking in Community Question Answering Forums. We show how fine-tuning a language model on a large unannotated corpus of old threads from Qatar Living forum helps us to classify question types (factual, opinion, socializing) and to judge the factuality of answers on the shared task labeled data from the same forum. Our system finished 4th and 2nd on Subtask A (question type classification) and B (answer factuality prediction), respectively, based on the official metric of accuracy.

pdf bib abs
AMPERSAND: Argument Mining for PERSuAsive oNline Discussions
Tuhin Chakrabarty | Christopher Hidey | Smaranda Muresan | Kathy McKeown | Alyssa Hwang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Argumentation is a type of discourse where speakers try to persuade their audience about the reasonableness of a claim by presenting supportive arguments. Most work in argument mining has focused on modeling arguments in monologues. We propose a computational model for argument mining in online persuasive discussion forums that brings together the micro-level (argument as product) and macro-level (argument as process) models of argumentation. Fundamentally, this approach relies on identifying relations between components of arguments in a discussion thread. Our approach for relation prediction uses contextual information in terms of fine-tuning a pre-trained language model and leveraging discourse relations based on Rhetorical Structure Theory. We additionally propose a candidate selection method to automatically predict what parts of one’s argument will be targeted by other participants in the discussion. Our models obtain significant improvements compared to recent state-of-the-art approaches using pointer networks and a pre-trained language model.

pdf bib abs
Fine-Tuned Neural Models for Propaganda Detection at the Sentence and Fragment levels
Tariq Alhindi | Jonas Pfeiffer | Smaranda Muresan
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda

This paper presents the CUNLP submission for the NLP4IF 2019 shared-task on Fine-Grained Propaganda Detection. Our system finished 5th out of 26 teams on the sentence-level classification task and 5th out of 11 teams on the fragment-level classification task based on our scores on the blind test set. We present our models, a discussion of our ablation studies and experiments, and an analysis of our performance on all eighteen propaganda techniques present in the corpus of the shared task.

2018

pdf bib abs
‘Lighter’ Can Still Be Dark: Modeling Comparative Color Descriptions
Olivia Winn | Smaranda Muresan
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We propose a novel paradigm of grounding comparative adjectives within the realm of color descriptions. Given a reference RGB color and a comparative term (e.g., lighter, darker), our model learns to ground the comparative as a direction in the RGB space such that the colors along the vector, rooted at the reference color, satisfy the comparison. Our model generates grounded representations of comparative adjectives with an average accuracy of 0.65 cosine similarity to the desired direction of change. These vectors approach colors with Delta-E scores of under 7 compared to the target colors, indicating the differences are very small with respect to human perception. Our approach makes use of a newly created dataset for this task derived from existing labeled color data.

pdf bib
Proceedings of the Workshop on Figurative Language Processing
Beata Beigman Klebanov | Ekaterina Shutova | Patricia Lichtenstein | Smaranda Muresan | Chee Wee
Proceedings of the Workshop on Figurative Language Processing

pdf bib abs
Where is Your Evidence: Improving Fact-checking by Justification Modeling
Tariq Alhindi | Savvas Petridis | Smaranda Muresan
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

Fact-checking is a journalistic practice that compares a claim made publicly against trusted sources of facts. Wang (2017) introduced a large dataset of validated claims from the POLITIFACT.com website (LIAR dataset), enabling the development of machine learning approaches for fact-checking. However, approaches based on this dataset have focused primarily on modeling the claim and speaker-related metadata, without considering the evidence used by humans in labeling the claims. We extend the LIAR dataset by automatically extracting the justification from the fact-checking article used by humans to label a given claim. We show that modeling the extracted justification in conjunction with the claim (and metadata) provides a significant improvement regardless of the machine learning model used (feature-based or deep learning) both in a binary classification task (true, false) and in a six-way classification task (pants on fire, false, mostly false, half true, mostly true, true).

pdf bib abs
Robust Document Retrieval and Individual Evidence Modeling for Fact Extraction and Verification.
Tuhin Chakrabarty | Tariq Alhindi | Smaranda Muresan
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

This paper presents the ColumbiaNLP submission for the FEVER Workshop Shared Task. Our system is an end-to-end pipeline that extracts factual evidence from Wikipedia and infers a decision about the truthfulness of the claim based on the extracted evidence. Our pipeline achieves significant improvement over the baseline for all the components (Document Retrieval, Sentence Selection and Textual Entailment) both on the development set and the test set. Our team finished 6th out of 24 teams on the leader-board based on the preliminary results with a FEVER score of 49.06 on the blind test set compared to 27.45 of the baseline system.

pdf bib abs
Automatically Tailoring Unsupervised Morphological Segmentation to the Language
Ramy Eskander | Owen Rambow | Smaranda Muresan
Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology

Morphological segmentation is beneficial for several natural language processing tasks dealing with large vocabularies. Unsupervised methods for morphological segmentation are essential for handling a diverse set of languages, including low-resource languages. Eskander et al. (2016) introduced a Language Independent Morphological Segmenter (LIMS) using Adaptor Grammars (AG) based on the best-on-average performing AG configuration. However, while LIMS worked best on average and outperforms other state-of-the-art unsupervised morphological segmentation approaches, it did not provide the optimal AG configuration for five out of the six languages. We propose two language-independent classifiers that enable the selection of the optimal or nearly-optimal configuration for the morphological segmentation of unseen languages.

pdf bib abs
Sarcasm Analysis Using Conversation Context
Debanjan Ghosh | Alexander R. Fabbri | Smaranda Muresan
Computational Linguistics, Volume 44, Issue 4 - December 2018

Computational models for sarcasm detection have often relied on the content of utterances in isolation. However, the speaker’s sarcastic intent is not always apparent without additional context. Focusing on social media discussions, we investigate three issues: (1) does modeling conversation context help in sarcasm detection? (2) can we identify what part of conversation context triggered the sarcastic reply? and (3) given a sarcastic post that contains multiple sentences, can we identify the specific sentence that is sarcastic? To address the first issue, we investigate several types of Long Short-Term Memory (LSTM) networks that can model both the conversation context and the current turn. We show that LSTM networks with sentence-level attention on context and current turn, as well as the conditional LSTM network, outperform the LSTM model that reads only the current turn. As conversation context, we consider the prior turn, the succeeding turn, or both. Our computational models are tested on two types of social media platforms: Twitter and discussion forums. We discuss several differences between these data sets, ranging from their size to the nature of the gold-label annotations. To address the latter two issues, we present a qualitative analysis of the attention weights produced by the LSTM models (with attention) and discuss the results compared with human performance on the two tasks.

pdf bib
A Multi-layer Annotated Corpus of Argumentative Text: From Argument Schemes to Discourse Relations
Elena Musi | Manfred Stede | Leonard Kriese | Smaranda Muresan | Andrea Rocci
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib abs
Analyzing the Semantic Types of Claims and Premises in an Online Persuasive Forum
Christopher Hidey | Elena Musi | Alyssa Hwang | Smaranda Muresan | Kathy McKeown
Proceedings of the 4th Workshop on Argument Mining

Argumentative text has been analyzed both theoretically and computationally in terms of argumentative structure that consists of argument components (e.g., claims, premises) and their argumentative relations (e.g., support, attack). Less emphasis has been placed on analyzing the semantic types of argument components. We propose a two-tiered annotation scheme to label claims and premises and their semantic types in an online persuasive forum, Change My View, with the long-term goal of understanding what makes a message persuasive. Premises are annotated with the three types of persuasive modes: ethos, logos, pathos, while claims are labeled as interpretation, evaluation, agreement, or disagreement, the latter two designed to account for the dialogical nature of our corpus. We aim to answer three questions: 1) can humans reliably annotate the semantic types of argument components? 2) are types of premises/claims positioned in recurrent orders? and 3) are certain types of claims and/or premises more likely to appear in persuasive messages than in non-persuasive messages?

pdf bib abs
The Role of Conversation Context for Sarcasm Detection in Online Interactions
Debanjan Ghosh | Alexander Richard Fabbri | Smaranda Muresan
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Computational models for sarcasm detection have often relied on the content of utterances in isolation. However, speaker’s sarcastic intent is not always obvious without additional context. Focusing on social media discussions, we investigate two issues: (1) does modeling of conversation context help in sarcasm detection and (2) can we understand what part of conversation context triggered the sarcastic reply. To address the first issue, we investigate several types of Long Short-Term Memory (LSTM) networks that can model both the conversation context and the sarcastic response. We show that the conditional LSTM network (Rocktäschel et al. 2015) and LSTM networks with sentence level attention on context and response outperform the LSTM model that reads only the response. To address the second issue, we present a qualitative analysis of attention weights produced by the LSTM models with attention and discuss the results compared with human performance on the task.