Eduardo Blanco

2025

This paper introduces UnSeenTimeQA, a novel data contamination-free time-sensitive question-answering (TSQA) benchmark. It differs from existing TSQA benchmarks by avoiding web-searchable queries grounded in the real world. We present a series of time-sensitive event scenarios based on synthetically generated facts. It requires large language models (LLMs) to engage in genuine temporal reasoning without depending on the factual knowledge acquired during the pre-training phase. Our data generation framework enables on-demand generation of new samples, mitigating the risk of data leakage. We designed three types of time-sensitive questions to test LLMs’ temporal reasoning abilities over sequential and parallel event occurrences. Our evaluation of five LLMs on synthetic fact-based TSQA reveals mixed results: while they perform well on simpler subsets, their overall performance remains inferior as compared to real world fact-based TSQA. Error analysis indicates that LLMs face difficulties in reasoning over long-range event dependencies and parallel events.

pdf bib abs
Assessing the Human Likeness of AI-Generated Counterspeech
Xiaoying Song | Sujana Mamidisetty | Eduardo Blanco | Lingzi Hong
Proceedings of the 31st International Conference on Computational Linguistics

Counterspeech is a targeted response to counteract and challenge abusive or hateful content. It effectively curbs the spread of hatred and fosters constructive online communication. Previous studies have proposed different strategies for automatically generated counterspeech. Evaluations, however, focus on relevance, surface form, and other shallow linguistic characteristics. This paper investigates the human likeness of AI-generated counterspeech, a critical factor influencing effectiveness. We implement and evaluate several LLM-based generation strategies, and discover that AI-generated and human-written counterspeech can be easily distinguished by both simple classifiers and humans. Further, we reveal differences in linguistic characteristics, politeness, and specificity. The dataset used in this study is publicly available for further research.

pdf bib abs
Echoes of Discord: Forecasting Hater Reactions to Counterspeech
Xiaoying Song | Sharon Lisseth Perez | Xinchen Yu | Eduardo Blanco | Lingzi Hong
Findings of the Association for Computational Linguistics: NAACL 2025

Hate speech (HS) erodes the inclusiveness of online users and propagates negativity and division. Counterspeech has been recognized as a way to mitigate the harmful consequences. While some research has investigated the impact of user-generated counterspeech on social media platforms, few have examined and modeled haters’ reactions toward counterspeech, despite the immediate alteration of haters’ attitudes being an important aspect of counterspeech. This study fills the gap by analyzing the impact of counterspeech from the hater’s perspective, focusing on whether the counterspeech leads the hater to reenter the conversation and if the reentry is hateful. We compile the Reddit Echoes of Hate dataset (ReEco), which consists of triple-turn conversations featuring haters’ reactions, to assess the impact of counterspeech. To predict haters’ behaviors, we employ two strategies: a two-stage reaction predictor and a three-way classifier. The linguistic analysis sheds insights on the language of counterspeech to hate eliciting different haters’ reactions. Experimental results demonstrate that the 3-way classification model outperforms the two-stage reaction predictor, which first predicts reentry and then determines the reentry type. We conclude the study with an assessment showing the most common errors identified by the best-performing model.

pdf bib abs
The Lies Characters Tell: Utilizing Large Language Models to Normalize Adversarial Unicode Perturbations
Portia Cooper | Eduardo Blanco | Mihai Surdeanu
Findings of the Association for Computational Linguistics: ACL 2025

Homoglyphs, Unicode characters that are visually homogeneous to Latin letters, are widely used to mask offensive content. Dynamic strategies are needed to combat homoglyphs as the Unicode library is ever-expanding and new substitution possibilities for Latin letters continuously emerge. The present study investigated two novel mitigation approaches that do not rely on strict mappings but instead harness the power of large language models to neutralize both known and unknown homoglyphs: (1) indirectly normalizing homoglyphs by replacing non-Latin characters with a delimiter and prompting large language models to “fill in the blanks” and (2) directly normalizing homoglyphs by using large language models to determine which characters should be replaced with Latin letters. We found that GPT-4o-mini constructed normalized text with an average cosine similarity score of 0.91 to the original tweets when applying our indirect method and 0.96 to the original tweets when applying our direct method. This study indicates that large language model-based normalization techniques can effectively unmask offensive content concealed by homoglyphs. Code and data are available in our GitHub repository: https://github.com/pcoopercoder/The-Lies-Characters-Tell.

pdf bib abs
BEMEAE: Moving Beyond Exact Span Match for Event Argument Extraction
Enfa Fane | Md Nayem Uddin | Oghenevovwe Ikumariegbe | Daniyal Kashif | Eduardo Blanco | Steven Corman
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Event Argument Extraction (EAE) is a key task in natural language processing, focusing on identifying and classifying event arguments in text. However, the widely adopted exact span match (ESM) evaluation metric has notable limitations due to its rigid span constraints, often misidentifying valid predictions as errors and underestimating system performance. In this paper, we evaluate nine state-of-the-art EAE models on the RAMS and GENEVA datasets, highlighting ESM’s limitations. To address these issues, we introduce BEMEAE (Beyond Exact Span Match for Event Argument Extraction), a novel evaluation metric that recognizes predictions that are semantically equivalent to or improve upon the reference. BEMEAE integrates deterministic components with a semantic matching component for more accurate assessment. Our experiments demonstrate that BEMEAE aligns more closely with human judgments. We show that BEMEAE not only leads to higher F1 scores compared to ESM but also results in significant changes in model rankings, underscoring ESM’s inadequacy for comprehensive evaluation of EAE.

pdf bib abs
Making Language Models Robust Against Negation
MohammadHossein Rezaei | Eduardo Blanco
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Negation has been a long-standing challenge for language models.Previous studies have shown that they struggle with negation in many natural language understanding tasks.In this work, we propose a self-supervised method to make language models more robust against negation.We introduce a novel task, Next Sentence Polarity Prediction (NSPP), and a variation of the Next Sentence Prediction (NSP) task.We show that BERT and RoBERTa further pre-trained on our tasks outperform the off-the-shelf versions on nine negation-related benchmarks.Most notably, our pre-training tasks yield between 1.8% and 9.1% improvement on CondaQA, a large question-answering corpus requiring reasoning over negation.

2024

pdf bib abs
Paraphrasing in Affirmative Terms Improves Negation Understanding
MohammadHossein Rezaei | Eduardo Blanco
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Negation is a common linguistic phenomenon. Yet language models face challenges with negation in many natural language understanding tasks such as question answering and natural language inference. In this paper, we experiment with seamless strategies that incorporate affirmative interpretations (i.e., paraphrases without negation) to make models more robust against negation. Crucially, our affirmative interpretations are obtained automatically. We show improvements with CondaQA, a large corpus requiring reasoning with negation, and five natural language understanding tasks.

pdf bib abs
Outcome-Constrained Large Language Models for Countering Hate Speech
Lingzi Hong | Pengcheng Luo | Eduardo Blanco | Xiaoying Song
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Automatic counterspeech generation methods have been developed to assist efforts in combating hate speech. Existing research focuses on generating counterspeech with linguistic attributes such as being polite, informative, and intent-driven. However, the real impact of counterspeech in online environments is seldom considered. This study aims to develop methods for generating counterspeech constrained by conversation outcomes and evaluate their effectiveness. We experiment with large language models (LLMs) to incorporate into the text generation process two desired conversation outcomes: low conversation incivility and non-hateful hater reentry. Specifically, we experiment with instruction prompts, LLM finetuning, and LLM reinforcement learning (RL). Evaluation results show that our methods effectively steer the generation of counterspeech toward the desired outcomes. Our analyses, however, show that there are differences in the quality and style depending on the model.

Claim: This work is not advocating the use of LLMs for paper (meta-)reviewing. Instead, wepresent a comparative analysis to identify and distinguish LLM activities from human activities. Two research goals: i) Enable better recognition of instances when someone implicitly uses LLMs for reviewing activities; ii) Increase community awareness that LLMs, and AI in general, are currently inadequate for performing tasks that require a high level of expertise and nuanced judgment.This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as they have to spend more time reading, writing, and reviewing papers. This raises the question: how can LLMs potentially assist researchers in alleviating their heavy workload?This study focuses on the topic of LLMs as NLP Researchers, particularly examining the effectiveness of LLMs in assisting paper (meta-)reviewing and its recognizability. To address this, we constructed the ReviewCritique dataset, which includes two types of information: (i) NLP papers (initial submissions rather than camera-ready) with both human-written and LLM-generated reviews, and (ii) each review comes with “deficiency” labels and corresponding explanations for individual segments, annotated by experts. Using ReviewCritique, this study explores two threads of research questions: (i) “LLMs as Reviewers”, how do reviews generated by LLMs compare with those written by humans in terms of quality and distinguishability? (ii) “LLMs as Metareviewers”, how effectively can LLMs identify potential issues, such as Deficient or unprofessional review segments, within individual paper reviews? To our knowledge, this is the first work to provide such a comprehensive analysis.

pdf bib abs
Interpreting Answers to Yes-No Questions in Dialogues from Multiple Domains
Zijie Wang | Farzana Rashid | Eduardo Blanco
Findings of the Association for Computational Linguistics: NAACL 2024

People often answer yes-no questions without explicitly saying yes, no, or similar polar key-words. Figuring out the meaning of indirectanswers is challenging, even for large language models. In this paper, we investigate this problem working with dialogues from multiple domains. We present new benchmarks in three diverse domains: movie scripts, tennis interviews, and airline customer service. We present an approach grounded on distant supervision and blended training to quickly adapt to a new dialogue domain. Experimental results show that our approach is never detrimental and yields F1 improvements as high as 11-34%.

pdf bib abs
RobustSentEmbed: Robust Sentence Embeddings Using Adversarial Self-Supervised Contrastive Learning
Javad Rafiei Asl | Prajwal Panzade | Eduardo Blanco | Daniel Takabi | Zhipeng Cai
Findings of the Association for Computational Linguistics: NAACL 2024

Pre-trained language models (PLMs) have consistently demonstrated outstanding performance across a diverse spectrum of natural language processing tasks. Nevertheless, despite their success with unseen data, current PLM-based representations often exhibit poor robustness in adversarial settings. In this paper, we introduce RobustSentEmbed, a self-supervised sentence embedding framework designed to improve both generalization and robustness in diverse text representation tasks and against a diverse set of adversarial attacks. Through the generation of high-risk adversarial perturbations and their utilization in a novel objective function, RobustSentEmbed adeptly learns high-quality and robust sentence embeddings. Our experiments confirm the superiority of RobustSentEmbed over state-of-the-art representations. Specifically, Our framework achieves a significant reduction in the success rate of various adversarial attacks, notably reducing the BERTAttack success rate by almost half (from 75.51% to 38.81%). The framework also yields improvements of 1.59% and 0.23% in semantic textual similarity tasks and various transfer tasks, respectively.

pdf bib abs
Learning to Generate Rules for Realistic Few-Shot Relation Classification: An Encoder-Decoder Approach
Mayank Singh | Eduardo Blanco
Findings of the Association for Computational Linguistics: EMNLP 2024

We propose a neuro-symbolic approach for realistic few-shot relation classification via rules. Instead of building neural models to predict relations, we design them to output straightforward rules that can be used to extract relations. The rules are generated using custom T5-style Encoder-Decoder Language Models. Crucially, our rules are fully interpretable and pliable (i.e., humans can easily modify them to boost performance). Through a combination of rules generated by these models along with a very effective, novel baseline, we demonstrate a few-shot relation-classification performance that is comparable to or stronger than the state of the art on the Few-Shot TACRED and NYT29 benchmarks while increasing interpretability and maintaining pliability.

pdf bib abs
ALIGN-SIM: A Task-Free Test Bed for Evaluating and Interpreting Sentence Embeddings through Semantic Similarity Alignment
Yash Mahajan | Naman Bansal | Eduardo Blanco | Santu Karmaker
Findings of the Association for Computational Linguistics: EMNLP 2024

Sentence embeddings play a pivotal role in a wide range of NLP tasks, yet evaluating and interpreting these real-valued vectors remains an open challenge to date, especially in a task-free setting. To address this challenge, we introduce a novel task-free test bed for evaluating and interpreting sentence embeddings. Our test bed consists of five semantic similarity alignment criteria, namely, *semantic distinction, synonym replacement, antonym replacement, paraphrasing without negation, and sentence jumbling*. Using these criteria, we examined five classical (e.g., Sentence-BERT, Universal Sentence Encoder (USE), etc.) and eight LLM-induced sentence embedding techniques (e.g., LLaMA2, GPT-3, OLMo, etc.) to test whether their semantic similarity spaces align with what a human mind would naturally expect. Our extensive experiments with 13 different sentence encoders revealed that none of the studied embeddings aligned with all the five semantic similarity alignment criteria. Yet, most encoders performed highly on the SentEval dataset, a popular task-specific benchmark. This finding demonstrates a significant limitation of the current practice in sentence embedding evaluation and associated popular benchmarks, a critical issue that needs careful attention and reassessment by the NLP community. Finally, we conclude the paper by highlighting the utility of the proposed alignment-based test bed for analyzing sentence embeddings in a novel way, especially in a task-free setting.

pdf bib abs
Analyzing Large Language Models’ Capability in Location Prediction
Zhaomin Xiao | Yan Huang | Eduardo Blanco
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this paper, we investigate and evaluate large language models’ capability in location prediction. We present experimental results with four models—FLAN-T5, FLAN-UL2, FLAN-Alpaca, and ChatGPT—in various instruction finetuning and exemplar settings. We analyze whether taking into account the context—tweets published before and after the tweet mentioning a location—is beneficial. Additionally, we conduct an ablation study to explore whether instruction modification is beneficial. Lastly, our qualitative analysis sheds light on the errors made by the best-performing model.

pdf bib abs
Asking and Answering Questions to Extract Event-Argument Structures
Md Nayem Uddin | Enfa Rose George | Eduardo Blanco | Steven R. Corman
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents a question-answering approach to extract document-level event-argument structures. We automatically ask and answer questions for each argument type an event may have. Questions are generated using manually defined templates and generative transformers. Template-based questions are generated using predefined role-specific wh-words and event triggers from the context document. Transformer-based questions are generated using large language models trained to formulate questions based on a passage and the expected answer. Additionally, we develop novel data augmentation strategies specialized in inter-sentential event-argument relations. We use a simple span-swapping technique, coreference resolution, and large language models to augment the training instances. Our approach enables transfer learning without any corpora-specific modifications and yields competitive results with the RAMS dataset. It outperforms previous work, and it is especially beneficial to extract arguments that appear in different sentences than the event trigger. We also present detailed quantitative and qualitative analyses shedding light on the most common errors made by our best model.

pdf bib abs
Generating Uncontextualized and Contextualized Questions for Document-Level Event Argument Extraction
Md Nayem Uddin | Enfa George | Eduardo Blanco | Steven Corman
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

This paper presents multiple question generation strategies for document-level event argument extraction. These strategies do not require human involvement and result in uncontextualized questions as well as contextualized questions grounded on the event and document of interest. Experimental results show that combining uncontextualized and contextualized questions is beneficial,especially when event triggers and arguments appear in different sentences. Our approach does not have corpus-specific components, in particular, the question generation strategies transfer across corpora. We also present a qualitative analysis of the most common errors made by our best model.

2023

pdf bib abs
A Fine-Grained Taxonomy of Replies to Hate Speech
Xinchen Yu | Ashley Zhao | Eduardo Blanco | Lingzi Hong
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Countering rather than censoring hate speech has emerged as a promising strategy to address hatred. There are many types of counterspeech in user-generated content: addressing the hateful content or its author, generic requests, well-reasoned counter arguments, insults, etc. The effectiveness of counterspeech, which we define as subsequent incivility, depends on these types. In this paper, we present a theoretically grounded taxonomy of replies to hate speech and a new corpus. We work with real, user-generated hate speech and all the replies it elicits rather than replies generated by a third party. Our analyses provide insights into the content real users reply with as well as which replies are empirically most effective. We also experiment with models to characterize the replies to hate speech, thereby opening the door to estimating whether a reply to hate speech will result in further incivility.

pdf bib abs
Finding Authentic Counterhate Arguments: A Case Study with Public Figures
Abdullah Albanyan | Ahmed Hassan | Eduardo Blanco
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

We explore authentic counterhate arguments for online hateful content toward individuals. Previous efforts are limited to counterhate to fight against hateful content toward groups. Thus, we present a corpus of 54,816 hateful tweet-paragraph pairs, where the paragraphs are candidate counterhate arguments. The counterhate arguments are retrieved from 2,500 online articles from multiple sources. We propose a methodology that assures the authenticity of the counter argument and its specificity to the individual of interest. We show that finding arguments in online articles is an efficient alternative to counterhate generation approaches that may hallucinate unsupported arguments. We also present linguistic insights on the language used in counterhate arguments. Experimental results show promising results. It is more challenging, however, to identify counterhate arguments for hateful content toward individuals not included in the training set.

Yes-no questions expect a yes or no for an answer, but people often skip polar keywords. Instead, they answer with long explanations that must be interpreted. In this paper, we focus on this challenging problem and release new benchmarks in eight languages. We present a distant supervision approach to collect training data, and demonstrate that direct answers (i.e., with polar keywords) are useful to train models to interpret indirect answers (i.e., without polar keywords). We show that monolingual fine-tuning is beneficial if training data can be obtained via distant supervision for the language of interest (5 languages). Additionally, we show that cross-lingual fine-tuning is always beneficial (8 languages).

pdf bib abs
Hiding in Plain Sight: Tweets with Hate Speech Masked by Homoglyphs
Portia Cooper | Mihai Surdeanu | Eduardo Blanco
Findings of the Association for Computational Linguistics: EMNLP 2023

To avoid detection by current NLP monitoring applications, progenitors of hate speech often replace one or more letters in offensive words with homoglyphs, visually similar Unicode characters. Harvesting real-world hate speech containing homoglyphs is challenging due to the vast replacement possibilities. We developed a character substitution scraping method and assembled the Offensive Tweets with Homoglyphs (OTH) Dataset (N=90,788) with more than 1.5 million occurrences of 1,281 non-Latin characters (emojis excluded). In an annotated sample (n=700), 40.14% of the tweets were found to contain hate speech. We assessed the performance of seven transformer-based hate speech detection models and found that they performed poorly in a zero-shot setting (F1 scores between 0.04 and 0.52) but normalizing the data dramatically improved detection (F1 scores between 0.59 and 0.71). Training the models using the annotated data further boosted performance (highest micro-averaged F1 score=0.88, using five-fold cross validation). This study indicates that a dataset containing homoglyphs known and unknown to the scraping script can be collected, and that neural models can be trained to recognize camouflaged real-world hate speech.

pdf bib abs
RobustEmbed: Robust Sentence Embeddings Using Self-Supervised Contrastive Pre-Training
Javad Asl | Eduardo Blanco | Daniel Takabi
Findings of the Association for Computational Linguistics: EMNLP 2023

Pre-trained language models (PLMs) have demonstrated their exceptional performance across a wide range of natural language processing tasks. The utilization of PLM-based sentence embeddings enables the generation of contextual representations that capture rich semantic information. However, despite their success with unseen samples, current PLM-based representations suffer from poor robustness in adversarial scenarios. In this paper, we propose RobustEmbed, a self-supervised sentence embedding framework that enhances both generalization and robustness in various text representation tasks and against diverse adversarial attacks. By generating high-risk adversarial perturbations to promote higher invariance in the embedding space and leveraging the perturbation within a novel contrastive objective approach, RobustEmbed effectively learns high-quality sentence embeddings. Our extensive experiments validate the superiority of RobustEmbed over previous state-of-the-art self-supervised representations in adversarial settings, while also showcasing relative improvements in seven semantic textual similarity (STS) tasks and six transfer tasks. Specifically, our framework achieves a significant reduction in attack success rate from 75.51% to 39.62% for the BERTAttack attack technique, along with enhancements of 1.20% and 0.40% in STS tasks and transfer tasks, respectively.

pdf bib abs
Interpreting Answers to Yes-No Questions in User-Generated Content
Shivam Mathur | Keun Park | Dhivya Chinnappa | Saketh Kotamraju | Eduardo Blanco
Findings of the Association for Computational Linguistics: EMNLP 2023

Interpreting answers to yes-no questions in social media is difficult. Yes and no keywords are uncommon, and the few answers that include them are rarely to be interpreted what the keywords suggest. In this paper, we present a new corpus of 4,442 yes-no question-answer pairs from Twitter. We discuss linguistic characteristics of answers whose interpretation is yes or no, as well as answers whose interpretation is unknown. We show that large language models are far from solving this problem, even after fine-tuning and blending other corpora for the same problem but outside social media.

pdf bib
Context Helps Determine Spatial Knowledge from Tweets
Zhaomin Xiao | Yan Huang | Eduardo Blanco
Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023 (Findings)

pdf bib abs
Synthetic Dataset for Evaluating Complex Compositional Knowledge for Natural Language Inference
Sushma Anand Akoju | Robert Vacareanu | Eduardo Blanco | Haris Riaz | Mihai Surdeanu
Proceedings of the 1st Workshop on Natural Language Reasoning and Structured Explanations (NLRSE)

We introduce a synthetic dataset called Sentences Involving Complex Compositional Knowledge (SICCK) and a novel analysis that investigates the performance of Natural Language Inference (NLI) models to understand compositionality in logic. We produce 1,304 sentence pairs by modifying 15 examples from the SICK dataset (Marelli et al., 2014). To this end, we modify the original texts using a set of phrases modifiers that correspond to universal quantifiers, existential quantifiers, negation, and other concept modifiers in Natural Logic (NL) (MacCartney, 2009). We use these phrases to modify the subject, verb, and object parts of the premise and hypothesis. Lastly, we annotate these modified texts with the corresponding entailment labels following NL rules. We conduct a preliminary verification of how well the change in the structural and semantic composition is captured by neural NLI models, in both zero-shot and fine-tuned scenarios. We found that the performance of NLI models under the zero-shot setting is poor, especially for modified sentences with negation and existential quantifiers. After fine-tuning this dataset, we observe that models continue to perform poorly over negation, existential and universal modifiers.

pdf bib abs
Not All Counterhate Tweets Elicit the Same Replies: A Fine-Grained Analysis
Abdullah Albanyan | Ahmed Hassan | Eduardo Blanco
Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)

Counterhate arguments can effectively fight and limit the spread of hate speech. However, they can also exacerbate the hate, as some people may respond with aggression if they feel threatened or targeted by the counterhate. In this paper, we investigate replies to counterhate arguments beyond whether the reply agrees or disagrees with the counterhate argument. We present a corpus with 2,621 replies to counterhate arguments countering hateful tweets, and annotate them with fine-grained characteristics. We show that (a) half of the replies (51%) to the counterhate arguments disagree with the argument, and (b) this kind of reply often supports the hateful tweet (40%). We also analyze the language of counterhate arguments that elicit certain types of replies. Experimental results show that it is feasible to anticipate the kind of replies a counterhate argument will elicit.

2022

pdf bib abs
An Analysis of Negation in Natural Language Understanding Corpora
Md Mosharaf Hossain | Dhivya Chinnappa | Eduardo Blanco
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

This paper analyzes negation in eight popular corpora spanning six natural language understanding tasks. We show that these corpora have few negations compared to general-purpose English, and that the few negations in them are often unimportant. Indeed, one can often ignore negations and still make the right predictions. Additionally, experimental results show that state-of-the-art transformers trained with these corpora obtain substantially worse results with instances that contain negation, especially if the negations are important. We conclude that new corpora accounting for negation are needed to solve natural language understanding tasks when negation is present.

pdf bib abs
Are People Located in the Places They Mention in Their Tweets? A Multimodal Approach
Zhaomin Xiao | Eduardo Blanco
Proceedings of the 29th International Conference on Computational Linguistics

This paper introduces the problem of determining whether people are located in the places they mention in their tweets. In particular, we investigate the role of text and images to solve this challenging problem. We present a new corpus of tweets that contain both text and images. Our analyses show that this problem is multimodal at its core: human judgments depend on whether annotators have access to the text, the image, or both. Experimental results show that a neural architecture that combines both modalities yields better results. We also conduct an error analysis to provide insights into why and when each modality is beneficial.

pdf bib abs
Leveraging Affirmative Interpretations from Negation Improves Natural Language Understanding
Md Mosharaf Hossain | Eduardo Blanco
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Negation poses a challenge in many natural language understanding tasks. Inspired by the fact that understanding a negated statement often requires humans to infer affirmative interpretations, in this paper we show that doing so benefits models for three natural language understanding tasks. We present an automated procedure to collect pairs of sentences with negation and their affirmative interpretations, resulting in over 150,000 pairs. Experimental results show that leveraging these pairs helps (a) T5 generate affirmative interpretations from negations in a previous benchmark, and (b) a RoBERTa-based classifier solve the task of natural language inference. We also leverage our pairs to build a plug-and-play neural generator that given a negated statement generates an affirmative interpretation. Then, we incorporate the pretrained generator into a RoBERTa-based classifier for sentiment analysis and show that doing so improves the results. Crucially, our proposal does not require any manual effort.

This paper explores a question-answer driven approach to reveal affirmative interpretations from verbal negations (i.e., when a negation cue grammatically modifies a verb). We create a new corpus consisting of 4,472 verbal negations and discover that 67.1% of them convey that an event actually occurred. Annotators generate and answer 7,277 questions % converted for 4,000 for the 3,001 negations that convey an affirmative interpretation. We first cast the problem of revealing affirmative interpretations from negations as a natural language inference (NLI) classification task. Experimental results show that state-of-the-art transformers trained with existing NLI corpora are insufficient to reveal affirmative interpretations. We also observe, however, that fine-tuning brings substantial improvements. In addition to NLI classification, we also explore the more realistic task of generating affirmative interpretations directly from negations with the T5 transformer. We conclude that the generation task remains a challenge as T5 substantially underperforms humans.

pdf bib abs
Disentangling Indirect Answers to Yes-No Questions in Real Conversations
Krishna Sanagavarapu | Jathin Singaraju | Anusha Kakileti | Anirudh Kaza | Aaron Mathews | Helen Li | Nathan Brito | Eduardo Blanco
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In this paper, we explore the task of determining indirect answers to yes-no questions in real conversations. We work with transcripts of phone conversations in the Switchboard Dialog Act (SwDA) corpus and create SwDA-IndirectAnswers (SwDA-IA), a subset of SwDA consisting of all conversations containing a yes-no question with an indirect answer. We annotate the underlying direct answers to the yes-no questions (yes, probably yes, middle, probably no, or no). We show that doing so requires taking into account conversation context: the indirect answer alone is insufficient to determine the ground truth. Experimental results also show that taking into account context is beneficial. More importantly, our results demonstrate that existing corpora with synthetic indirect answers to yes-no questions are not beneficial when working with real conversations. Our best models outperform the majority baseline by a substantial margin, but the task remains a challenge (F1: 0.46).

pdf bib abs
Hate Speech and Counter Speech Detection: Conversational Context Does Matter
Xinchen Yu | Eduardo Blanco | Lingzi Hong
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Hate speech is plaguing the cyberspace along with user-generated content. Adding counter speech has become an effective way to combat hate speech online. Existing datasets and models target either (a) hate speech or (b) hate and counter speech but disregard the context. This paper investigates the role of context in the annotation and detection of online hate and counter speech, where context is defined as the preceding comment in a conversation thread. We created a context-aware dataset for a 3-way classification task on Reddit comments: hate speech, counter speech, or neutral. Our analyses indicate that context is critical to identify hate and counter speech: human judgments change for most comments depending on whether we show annotators the context. A linguistic analysis draws insights into the language people use to express hate and counter speech. Experimental results show that neural networks obtain significantly better results if context is taken into account. We also present qualitative error analyses shedding light into (a) when and why context is beneficial and (b) the remaining errors made by our best model when context is taken into account.

2021

pdf bib abs
Written Justifications are Key to Aggregate Crowdsourced Forecasts
Saketh Kotamraju | Eduardo Blanco
Findings of the Association for Computational Linguistics: EMNLP 2021

This paper demonstrates that aggregating crowdsourced forecasts benefits from modeling the written justifications provided by forecasters. Our experiments show that the majority and weighted vote baselines are competitive, and that the written justifications are beneficial to call a question throughout its life except in the last quarter. We also conduct an error analysis shedding light into the characteristics that make a justification unreliable.

2020

pdf bib abs
Beyond Possession Existence: Duration and Co-Possession
Dhivya Chinnappa | Srikala Murugan | Eduardo Blanco
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper introduces two tasks: determining (a) the duration of possession relations and (b) co-possessions, i.e., whether multiple possessors possess a possessee at the same time. We present new annotations on top of corpora annotating possession existence and experimental results. Regarding possession duration, we derive the time spans we work with empirically from annotations indicating lower and upper bounds. Regarding co-possessions, we use a binary label. Cohen’s kappa coefficients indicate substantial agreement, and experimental results show that text is more useful than the image for solving these tasks.

pdf bib abs
Predicting the Focus of Negation: Model and Error Analysis
Md Mosharaf Hossain | Kathleen Hamilton | Alexis Palmer | Eduardo Blanco
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The focus of a negation is the set of tokens intended to be negated, and a key component for revealing affirmative alternatives to negated utterances. In this paper, we experiment with neural networks to predict the focus of negation. Our main novelty is leveraging a scope detector to introduce the scope of negation as an additional input to the network. Experimental results show that doing so obtains the best results to date. Additionally, we perform a detailed error analysis providing insights into the main error categories, and analyze errors depending on whether the model takes into account scope and context information.

Patient adherence is a critical factor in health outcomes. We present a framework to extract adherence information from electronic health records, including both sentence-level information indicating general adherence information (full, partial, none, etc.) and span-level information providing additional information such as adherence type (medication or nonmedication), reasons and outcomes. We annotate and make publicly available a new corpus of 3,000 de-identified sentences, and discuss the language physicians use to document adherence information. We also explore models based on state-of-the-art transformers to automate both tasks.

pdf bib abs
An Analysis of Natural Language Inference Benchmarks through the Lens of Negation
Md Mosharaf Hossain | Venelin Kovatchev | Pranoy Dutta | Tiffany Kao | Elizabeth Wei | Eduardo Blanco
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Negation is underrepresented in existing natural language inference benchmarks. Additionally, one can often ignore the few negations in existing benchmarks and still make the right inference judgments. In this paper, we present a new benchmark for natural language inference in which negation plays a critical role. We also show that state-of-the-art transformers struggle making inference judgments with the new pairs.

pdf bib abs
Helpful or Hierarchical? Predicting the Communicative Strategies of Chat Participants, and their Impact on Success
Farzana Rashid | Tommaso Fornaciari | Dirk Hovy | Eduardo Blanco | Fernando Vega-Redondo
Findings of the Association for Computational Linguistics: EMNLP 2020

When interacting with each other, we motivate, advise, inform, show love or power towards our peers. However, the way we interact may also hold some indication on how successful we are, as people often try to help each other to achieve their goals. We study the chat interactions of thousands of aspiring entrepreneurs who discuss and develop business models. We manually annotate a set of about 5,500 chat interactions with four dimensions of interaction styles (motivation, cooperation, equality, advice). We find that these styles can be reliably predicted, and that the communication styles can be used to predict a number of indices of business success. Our findings indicate that successful communicators are also successful in other domains.

pdf bib abs
It’s not a Non-Issue: Negation as a Source of Error in Machine Translation
Md Mosharaf Hossain | Antonios Anastasopoulos | Eduardo Blanco | Alexis Palmer
Findings of the Association for Computational Linguistics: EMNLP 2020

As machine translation (MT) systems progress at a rapid pace, questions of their adequacy linger. In this study we focus on negation, a universal, core property of human language that significantly affects the semantics of an utterance. We investigate whether translating negation is an issue for modern MT systems using 17 translation directions as test bed. Through thorough analysis, we find that indeed the presence of negation can significantly impact downstream quality, in some cases resulting in quality reductions of more than 60%. We also provide a linguistically motivated analysis that directly explains the majority of our findings. We release our annotations and code to replicate our analysis here: https://github.com/mosharafhossain/negation-mt.

pdf bib abs
Determining Event Outcomes: The Case of #fail
Srikala Murugan | Dhivya Chinnappa | Eduardo Blanco
Findings of the Association for Computational Linguistics: EMNLP 2020

This paper targets the task of determining event outcomes in social media. We work with tweets containing either #cookingFail or #bakingFail, and show that many of the events described in them resulted in something edible. Tweets that contain images are more likely to result in edible albeit imperfect outcomes. Experimental results show that edibility is easier to predict than outcome quality.

pdf bib abs
WikiPossessions: Possession Timeline Generation as an Evaluation Benchmark for Machine Reading Comprehension of Long Texts
Dhivya Chinnappa | Alexis Palmer | Eduardo Blanco
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper presents WikiPossessions, a new benchmark corpus for the task of temporally-oriented possession (TOP), or tracking objects as they change hands over time. We annotate Wikipedia articles for 90 different well-known artifacts paintings, diamonds, and archaeological artifacts), producing 799 artifact-possessor relations with associated attributes. For each article, we also produce a full possession timeline. The full version of the task combines straightforward entity-relation extraction with complex temporal reasoning, as well as verification of textual support for the relevant types of knowledge. Specifically, to complete the full TOP task for a given article, a system must do the following: a) identify possessors; b) anchor possessors to times/events; c) identify temporal relations between each temporal anchor and the possession relation it corresponds to; d) assign certainty scores to each possessor and each temporal relation; and e) assemble individual possession events into a global possession timeline. In addition to the corpus, we release evaluation scripts and a baseline model for the task.

pdf bib abs
Detecting Negation Cues and Scopes in Spanish
Salud María Jiménez-Zafra | Roser Morante | Eduardo Blanco | María Teresa Martín Valdivia | L. Alfonso Ureña López
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this work we address the processing of negation in Spanish. We first present a machine learning system that processes negation in Spanish. Specifically, we focus on two tasks: i) negation cue detection and ii) scope identification. The corpus used in the experimental framework is the SFU Corpus. The results for cue detection outperform state-of-the-art results, whereas for scope detection this is the first system that performs the task for Spanish. Moreover, we provide a qualitative error analysis aimed at understanding the limitations of the system and showing which negation cues and scopes are straightforward to predict automatically, and which ones are challenging.

2019

pdf bib abs
Extracting Possessions from Social Media: Images Complement Language
Dhivya Chinnappa | Srikala Murugan | Eduardo Blanco
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

This paper describes a new dataset and experiments to determine whether authors of tweets possess the objects they tweet about. We work with 5,000 tweets and show that both humans and neural networks benefit from images in addition to text. We also introduce a simple yet effective strategy to incorporate visual information into any neural network beyond weights from pretrained networks. Specifically, we consider the tags identified in an image as an additional textual input, and leverage pretrained word embeddings as usually done with regular text. Experimental results show this novel strategy is beneficial.

pdf bib abs
Incorporating Emoji Descriptions Improves Tweet Classification
Abhishek Singh | Eduardo Blanco | Wei Jin
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Tweets are short messages that often include specialized language such as hashtags and emojis. In this paper, we present a simple strategy to process emojis: replace them with their natural language description and use pretrained word embeddings as normally done with standard words. We show that this strategy is more effective than using pretrained emoji embeddings for tweet classification. Specifically, we obtain new state-of-the-art results in irony detection and sentiment analysis despite our neural network is simpler than previous proposals.

pdf bib abs
A Corpus of Negations and their Underlying Positive Interpretations
Zahra Sarabi | Erin Killian | Eduardo Blanco | Alexis Palmer
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

Negation often conveys implicit positive meaning. In this paper, we present a corpus of negations and their underlying positive interpretations. We work with negations from Simple Wikipedia, automatically generate potential positive interpretations, and then collect manual annotations that effectively rewrite the negation in positive terms. This procedure yields positive interpretations for approximately 77% of negations, and the final corpus includes over 5,700 negations and over 5,900 positive interpretations. We also present baseline results using seq2seq neural models.

2018

pdf bib abs
Possessors Change Over Time: A Case Study with Artworks
Dhivya Chinnappa | Eduardo Blanco
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This paper presents a corpus and experimental results to extract possession relations over time. We work with Wikipedia articles about artworks, and extract possession relations along with temporal information indicating when these relations are true. The annotation scheme yields many possessors over time for a given artwork, and experimental results show that an LSTM ensemble can automate the task.

pdf bib abs
Characterizing Interactions and Relationships between People
Farzana Rashid | Eduardo Blanco
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This paper presents a set of dimensions to characterize the association between two people. We distinguish between interactions (when somebody refers to somebody in a conversation) and relationships (a sequence of interactions). We work with dialogue scripts from the TV show Friends, and do not impose any restrictions on the interactions and relationships. We introduce and analyze a new corpus, and present experimental results showing that the task can be automated.

pdf bib
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Eduardo Blanco | Wei Lu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

pdf bib
Annotating Temporally-Anchored Spatial Knowledge by Leveraging Syntactic Dependencies
Alakananda Vempala | Eduardo Blanco
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Annotating If the Authors of a Tweet are Located at the Locations They Tweet About
Vivek Doudagiri | Alakananda Vempala | Eduardo Blanco
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib abs
Mining Possessions: Existence, Type and Temporal Anchors
Dhivya Chinnappa | Eduardo Blanco
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

This paper presents a corpus and experiments to mine possession relations from text. Specifically, we target alienable and control possessions, and assign temporal anchors indicating when the possession holds between possessor and possessee. We present new annotations for this task, and experimental results using both traditional classifiers and neural networks. Results show that the three subtasks (predicting possession existence, possession type and temporal anchors) can be automated.

pdf bib abs
Determining Event Durations: Models and Error Analysis
Alakananda Vempala | Eduardo Blanco | Alexis Palmer
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

This paper presents models to predict event durations. We introduce aspectual features that capture deeper linguistic information than previous work, and experiment with neural networks. Our analysis shows that tense, aspect and temporal structure of the clause provide useful clues, and that an LSTM ensemble captures relevant context around the event.

pdf bib
Proceedings of the Workshop on Computational Semantics beyond Events and Roles
Eduardo Blanco | Roser Morante
Proceedings of the Workshop on Computational Semantics beyond Events and Roles

2017

pdf bib abs
Dimensions of Interpersonal Relationships: Corpus and Experiments
Farzana Rashid | Eduardo Blanco
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This paper presents a corpus and experiments to determine dimensions of interpersonal relationships. We define a set of dimensions heavily inspired by work in social science. We create a corpus by retrieving pairs of people, and then annotating dimensions for their relationships. A corpus analysis shows that dimensions can be annotated reliably. Experimental results show that given a pair of people, values to dimensions can be assigned automatically.

pdf bib abs
If No Media Were Allowed inside the Venue, Was Anybody Allowed?
Zahra Sarabi | Eduardo Blanco
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

This paper presents a framework to understand negation in positive terms. Specifically, we extract positive meaning from negation when the negation cue syntactically modifies a noun or adjective. Our approach is grounded on generating potential positive interpretations automatically, and then scoring them. Experimental results show that interpretations scored high can be reliably identified.

pdf bib abs
Determining Whether and When People Participate in the Events They Tweet About
Krishna Chaitanya Sanagavarapu | Alakananda Vempala | Eduardo Blanco
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

This paper describes an approach to determine whether people participate in the events they tweet about. Specifically, we determine whether people are participants in events with respect to the tweet timestamp. We target all events expressed by verbs in tweets, including past, present and events that may occur in the future. We present new annotations using 1,096 event mentions, and experimental results showing that the task is challenging.

pdf bib
Proceedings of the Workshop Computational Semantics Beyond Events and Roles
Eduardo Blanco | Roser Morante | Roser Saurí
Proceedings of the Workshop Computational Semantics Beyond Events and Roles

2016

pdf bib
Automatic Extraction of Implicit Interpretations from Modal Constructions
Jordan Sanders | Eduardo Blanco
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Understanding Negation in Positive Terms Using Syntactic Dependencies
Zahra Sarabi | Eduardo Blanco
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib abs
Annotating Temporally-Anchored Spatial Knowledge on Top of OntoNotes Semantic Roles
Alakananda Vempala | Eduardo Blanco
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents a two-step methodology to annotate spatial knowledge on top of OntoNotes semantic roles. First, we manipulate semantic roles to automatically generate potential additional spatial knowledge. Second, we crowdsource annotations with Amazon Mechanical Turk to either validate or discard the potential additional spatial knowledge. The resulting annotations indicate whether entities are or are not located somewhere with a degree of certainty, and temporally anchor this spatial information. Crowdsourcing experiments show that the additional spatial knowledge is ubiquitous and intuitive to humans, and experimental results show that it can be inferred automatically using standard supervised machine learning techniques.

pdf bib
Automatic Generation and Scoring of Positive Interpretations from Negated Statements
Eduardo Blanco | Zahra Sarabi
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Beyond Plain Spatial Knowledge: Determining Where Entities Are and Are Not Located, and For How Long
Alakananda Vempala | Eduardo Blanco
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics (ExProM)
Eduardo Blanco | Roser Morante | Roser Saurí
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics (ExProM)

Semantic representation of text is key to text understanding and reasoning. In this paper, we present Polaris, Lymba's semantic parser. Polaris is a supervised semantic parser that given text extracts semantic relations. It extracts relations from a wide variety of lexico-syntactic patterns, including verb-argument structures, noun compounds and others. The output can be provided in several formats: XML, RDF triples, logic forms or plain text, facilitating interoperability with other tools. Polaris is implemented using eight separate modules. Each module is explained and a detailed example of processing using a sample sentence is provided. Overall results using a benchmark are discussed. Per module performance, including errors made and pruned by each module are also analyzed.

pdf bib
Fine-Grained Focus for Pinpointing Positive Implicit Meaning from Negated Statements
Eduardo Blanco | Dan Moldovan
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
*SEM 2012 Shared Task: Resolving the Scope and Focus of Negation
Roser Morante | Eduardo Blanco
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

pdf bib
Semantic Representation of Negation Using Focus Detection
Eduardo Blanco | Dan Moldovan
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Unsupervised Learning of Semantic Relation Composition
Eduardo Blanco | Dan Moldovan
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
A Model for Composing Semantic Relations
Eduardo Blanco | Dan Moldovan
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

2010

pdf bib
Composition of Semantic Relations: Model and Applications
Eduardo Blanco | Hakki C. Cankaya | Dan Moldovan
Coling 2010: Posters

pdf bib
Automatic Discovery of Manner Relations and its Applications
Eduardo Blanco | Dan Moldovan
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2008

pdf bib abs
Causal Relation Extraction
Eduardo Blanco | Nuria Castell | Dan Moldovan
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents a supervised method for the detection and extraction of Causal Relations from open domain text. First we give a brief outline of the definition of causation and how it relates to other Semantic Relations, as well as a characterization of their encoding. In this work, we only consider marked and explicit causations. Our approach first identifies the syntactic patterns that may encode a causation, then we use Machine Learning techniques to decide whether or not a pattern instance encodes a causation. We focus on the most productive pattern, a verb phrase followed by a relator and a clause, and its reverse version, a relator followed by a clause and a verb phrase. As relators we consider the words as, after, because and since. We present a set of lexical, syntactic and semantic features for the classification task, their rationale and some examples. The results obtained are discussed and the errors analyzed.