Sanda Harabagiu

Also published as: Sanda M. Harabagiu


2022

pdf
VaccineLies: A Natural Language Resource for Learning to Recognize Misinformation about the COVID-19 and HPV Vaccines
Maxwell Weinzierl | Sanda Harabagiu
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Billions of COVID-19 vaccines have been administered, but many remain hesitant. Misinformation about the COVID-19 vaccines and other vaccines, propagating on social media, is believed to drive hesitancy towards vaccination. The ability to automatically recognize misinformation targeting vaccines on Twitter depends on the availability of data resources. In this paper we present VaccineLies, a large collection of tweets propagating misinformation about two vaccines: the COVID-19 vaccines and the Human Papillomavirus (HPV) vaccines. Misinformation targets are organized in vaccine-specific taxonomies, which reveal the misinformation themes and concerns. The ontological commitments of the misinformation taxonomies provide an understanding of which misinformation themes and concerns dominate the discourse about the two vaccines covered in VaccineLies. The organization into training, testing and development sets of VaccineLies invites the development of novel supervised methods for detecting misinformation on Twitter and identifying the stance towards it. Furthermore, VaccineLies can be a stepping stone for the development of datasets focusing on misinformation targeting additional vaccines.

2020

pdf
The Language of Brain Signals: Natural Language Processing of Electroencephalography Reports
Ramon Maldonado | Sanda Harabagiu
Proceedings of the Twelfth Language Resources and Evaluation Conference

Brain signals are captured by clinical electroencephalography (EEG) which is an excellent tool for probing neural function. When EEG tests are performed, a textual EEG report is generated by the neurologist to document the findings, thus using language that describes the brain signals and its clinical correlations. Even with the impetus provided by the BRAIN initiative (brainitititive.nih.gov), there are no annotations available in texts that capture language describing the brain activities and their correlations with various pathologies. In this paper we describe an annotation effort carried out on a large corpus of EEG reports, providing examples of EEG-specific and clinically relevant concepts. In addition, we detail our annotation schema for brain signal attributes. We also discuss the resulting annotation of long-distance relations between concepts in EEG reports. By exemplifying a self-attention joint-learning to predict similar annotations in the EEG report corpus, we discuss the promising results, hoping that our effort will inform the design of novel knowledge capture techniques that will include the language of brain signals.

pdf
HLTRI at W-NUT 2020 Shared Task-3: COVID-19 Event Extraction from Twitter Using Multi-Task Hopfield Pooling
Maxwell Weinzierl | Sanda Harabagiu
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

Extracting structured knowledge involving self-reported events related to the COVID-19 pandemic from Twitter has the potential to inform surveillance systems that play a critical role in public health. The event extraction challenge presented by the W-NUT 2020 Shared Task 3 focused on the identification of five types of events relevant to the COVID-19 pandemic and their respective set of pre-defined slots encoding demographic, epidemiological, clinical as well as spatial, temporal or subjective knowledge. Our participation in the challenge led to the design of a neural architecture for jointly identifying all Event Slots expressed in a tweet relevant to an event of interest. This architecture uses COVID-Twitter-BERT as the pre-trained language model. In addition, to learn text span embeddings for each Event Slot, we relied on a special case of Hopfield Networks, namely Hopfield pooling. The results of the shared task evaluation indicate that our system performs best when it is trained on a larger dataset, while it remains competitive when training on smaller datasets.

2016

pdf
Embedding Open-domain Common-sense Knowledge from Text
Travis Goodwin | Sanda Harabagiu
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Our ability to understand language often relies on common-sense knowledge ― background information the speaker can assume is known by the reader. Similarly, our comprehension of the language used in complex domains relies on access to domain-specific knowledge. Capturing common-sense and domain-specific knowledge can be achieved by taking advantage of recent advances in open information extraction (IE) techniques and, more importantly, of knowledge embeddings, which are multi-dimensional representations of concepts and relations. Building a knowledge graph for representing common-sense knowledge in which concepts discerned from noun phrases are cast as vertices and lexicalized relations are cast as edges leads to learning the embeddings of common-sense knowledge accounting for semantic compositionality as well as implied knowledge. Common-sense knowledge is acquired from a vast collection of blogs and books as well as from WordNet. Similarly, medical knowledge is learned from two large sets of electronic health records. The evaluation results of these two forms of knowledge are promising: the same knowledge acquisition methodology based on learning knowledge embeddings works well both for common-sense knowledge and for medical knowledge Interestingly, the common-sense knowledge that we have acquired was evaluated as being less neutral than than the medical knowledge, as it often reflected the opinion of the knowledge utterer. In addition, the acquired medical knowledge was evaluated as more plausible than the common-sense knowledge, reflecting the complexity of acquiring common-sense knowledge due to the pragmatics and economicity of language.

2014

pdf
Unsupervised Event Coreference Resolution
Cosmin Adrian Bejan | Sanda Harabagiu
Computational Linguistics, Volume 40, Issue 2 - June 2014

pdf
Structuring Operative Notes using Active Learning
Kirk Roberts | Sanda Harabagiu | Michael Skinner
Proceedings of BioNLP 2014

pdf
Clinical Data-Driven Probabilistic Graph Processing
Travis Goodwin | Sanda Harabagiu
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Electronic Medical Records (EMRs) encode an extraordinary amount of medical knowledge. Collecting and interpreting this knowledge, however, belies a significant level of clinical understanding. Automatically capturing the clinical information is crucial for performing comparative effectiveness research. In this paper, we present a data-driven approach to model semantic dependencies between medical concepts, qualified by the beliefs of physicians. The dependencies, captured in a patient cohort graph of clinical pictures and therapies is further refined into a probabilistic graphical model which enables efficient inference of patient-centered treatment or test recommendations (based on probabilities). To perform inference on the graphical model, we describe a technique of smoothing the conditional likelihood of medical concepts by their semantically-similar belief values. The experimental results, as compared against clinical guidelines are very promising.

2013

pdf
The Impact of Selectional Preference Agreement on Semantic Relational Similarity
Bryan Rink | Sanda Harabagiu
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers

pdf
Recognizing Spatial Containment Relations between Event Mentions
Kirk Roberts | Michael A. Skinner | Sanda M. Harabagiu
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers

2012

pdf
EmpaTweet: Annotating and Detecting Emotions on Twitter
Kirk Roberts | Michael A. Roach | Joseph Johnson | Josh Guthrie | Sanda M. Harabagiu
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The rise of micro-blogging in recent years has resulted in significant access to emotion-laden text. Unlike emotion expressed in other textual sources (e.g., blogs, quotes in newswire, email, product reviews, or even clinical text), micro-blogs differ by (1) placing a strict limit on length, resulting radically in new forms of emotional expression, and (2) encouraging users to express their daily thoughts in real-time, often resulting in far more emotion statements than might normally occur. In this paper, we introduce a corpus collected from Twitter with annotated micro-blog posts (or “tweets”) annotated at the tweet-level with seven emotions: ANGER, DISGUST, FEAR, JOY, LOVE, SADNESS, and SURPRISE. We analyze how emotions are distributed in the data we annotated and compare it to the distributions in other emotion-annotated corpora. We also used the annotated corpus to train a classifier that automatically discovers the emotions in tweets. In addition, we present an analysis of the linguistic style used for expressing emotions our corpus. We hope that these observations will lead to the design of novel emotion detection techniques that account for linguistic style and psycholinguistic theories.

pdf
Annotating Spatial Containment Relations Between Events
Kirk Roberts | Travis Goodwin | Sanda M. Harabagiu
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

A significant amount of spatial information in textual documents is hidden within the relationship between events. While humans have an intuitive understanding of these relationships that allow us to recover an object's or event's location, currently no annotated data exists to allow automatic discovery of spatial containment relations between events. We present our process for building such a corpus of manually annotated spatial relations between events. Events form complex predicate-argument structures that model the participants in the event, their roles, as well as the temporal and spatial grounding. In addition, events are not presented in isolation in text; there are explicit and implicit interactions between events that often participate in event structures. In this paper, we focus on five spatial containment relations that may exist between events: (1) SAME, (2) CONTAINS, (3) OVERLAPS, (4) NEAR, and (5) DIFFERENT. Using the transitive closure across these spatial relations, the implicit location of many events and their participants can be discovered. We discuss our annotation schema for spatial containment relations, placing it within the pre-existing theories of spatial representation. We also discuss our annotation guidelines for maintaining annotation quality as well as our process for augmenting SpatialML with spatial containment relations between events. Additionally, we outline some baseline experiments to evaluate the feasibility of developing supervised systems based on this corpus. These results indicate that although the task is challenging, automated methods are capable of discovering spatial containment relations between events.

pdf
UTD: Determining Relational Similarity Using Lexical Patterns
Bryan Rink | Sanda Harabagiu
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf
UTD-SpRL: A Joint Approach to Spatial Role Labeling
Kirk Roberts | Sanda Harabagiu
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf
UTDHLT: COPACETIC System for Choosing Plausible Alternatives
Travis Goodwin | Bryan Rink | Kirk Roberts | Sanda Harabagiu
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

pdf
A generative model for unsupervised discovery of relations and argument classes from clinical texts
Bryan Rink | Sanda Harabagiu
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf
Unsupervised Learning of Selectional Restrictions and Detection of Argument Coercions
Kirk Roberts | Sanda Harabagiu
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf
Unsupervised Discovery of Collective Action Frames for Socio-Cultural Analysis
Andrew Hickl | Sanda Harabagiu
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

pdf
A Linguistic Resource for Semantic Parsing of Motion Events
Kirk Roberts | Srikanth Gullapalli | Cosmin Adrian Bejan | Sanda Harabagiu
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents a corpus of annotated motion events and their event structure. We consider motion events triggered by a set of motion evoking words and contemplate both literal and figurative interpretations of them. Figurative motion events are extracted into the same event structure but are marked as figurative in the corpus. To represent the event structure of motion, we use the FrameNet annotation standard, which encodes motion in over 70 frames. In order to acquire a diverse set of texts that are different from FrameNet's, we crawled blog and news feeds for five different domains: sports, newswire, finance, military, and gossip. We then annotated these documents with an automatic FrameNet parser. Its output was manually corrected to account for missing and incorrect frames as well as missing and incorrect frame elements. The corpus, UTD-MotionEvent, may act as a resource for semantic parsing, detection of figurative language, spatial reasoning, and other tasks.

pdf
UTDMet: Combining WordNet and Corpus Data for Argument Coercion Detection
Kirk Roberts | Sanda Harabagiu
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf
UTD: Classifying Semantic Relations by Combining Lexical and Semantic Resources
Bryan Rink | Sanda Harabagiu
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf
Unsupervised Event Coreference Resolution with Rich Linguistic Features
Cosmin Bejan | Sanda Harabagiu
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2008

pdf
A Linguistic Resource for Discovering Event Structures and Resolving Event Coreference
Cosmin Bejan | Sanda Harabagiu
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we present a linguistic resource that annotates event structures in texts. We consider an event structure as a collection of events that interact with each other in a given situation. We interpret the interactions between events as event relations. In this regard, we propose and annotate a set of six relations that best capture the concept of event structure. These relations are: subevent, reason, purpose, enablement, precedence and related. A document from this resource can encode multiple event structures and an event structure can be described across multiple documents. In order to unify event structures, we also annotate inter- and intra-document event coreference. Moreover, we provide methodologies for automatic discovery of event structures from texts. First, we group the events that constitute an event structure into event clusters and then, we use supervised learning frameworks to classify the relations that exist between events from the same cluster

2007

pdf
Textual Entailment Through Extended Lexical Overlap and Lexico-Semantic Matching
Rod Adams | Gabriel Nicolae | Cristina Nicolae | Sanda Harabagiu
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

pdf
UTD-HLT-CG: Semantic Architecture for Metonymy Resolution and Classification of Nominal Relations
Cristina Nicolae | Gabriel Nicolae | Sanda Harabagiu
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

pdf
Impact of Question Decomposition on the Quality of Answer Summaries
Finley Lacatusu | Andrew Hickl | Sanda Harabagiu
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Generating answers to complex questions in the form of multi-document summaries requires access to question decomposition methods. In this paper we present three methods for decomposing complex questions and we evaluate their impact on the responsiveness of the answers they enable.

pdf
An Answer Bank for Temporal Inference
Sanda Harabagiu | Cosmin Adrian Bejan
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Answering questions that ask about temporal information involves several forms of inference. In order to develop question answering capabilities that benefit from temporal inference, we believe that a large corpus of questions and answers that are discovered based on temporal information should be available. This paper describes our methodology for creating AnswerTime-Bank, a large corpus of questions and answers on which Question Answering systems can operate using complex temporal inference.

pdf
Methods for Using Textual Entailment in Open-Domain Question Answering
Sanda Harabagiu | Andrew Hickl
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf
FERRET: Interactive Question-Answering for Real-World Environments
Andrew Hickl | Patrick Wang | John Lehmann | Sanda Harabagiu
Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions

pdf
Using Scenario Knowledge in Automatic Question Answering
Sanda Harabagiu | Andrew Hickl
Proceedings of the Workshop on Task-Focused Summarization and Question Answering

pdf
Enhanced Interactive Question-Answering with Conditional Random Fields
Andrew Hickl | Sanda Harabagiu
Proceedings of the Interactive Question Answering Workshop at HLT-NAACL 2006

2005

pdf
Experiments with Interactive Question-Answering
Sanda Harabagiu | Andrew Hickl | John Lehmann | Dan Moldovan
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf
Semantic parsing based on FrameNet
Cosmin Adrian Bejan | Alessandro Moschitti | Paul Morărescu | Gabriel Nicolae | Sanda Harabagiu
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf
Strategies for Advanced Question Answering
Sanda Harabagiu | Finley Lacatusu
Proceedings of the Workshop on Pragmatics of Question Answering at HLT-NAACL 2004

pdf
Answering Questions Using Advanced Semantics and Probabilistic Inference
Srini Narayanan | Sanda Harabagiu
Proceedings of the Workshop on Pragmatics of Question Answering at HLT-NAACL 2004

pdf
Intentions, Implicatures and Processing of Complex Questions
Sanda Harabagiu | Steven Maiorano | Alessandro Moschitti | Cosmin Bejan
Proceedings of the Workshop on Pragmatics of Question Answering at HLT-NAACL 2004

pdf
A Novel Approach to Focus Identification in Question/Answering Systems
Alessandro Moschitti | Sanda Harabagiu
Proceedings of the Workshop on Pragmatics of Question Answering at HLT-NAACL 2004

pdf
Experiments with Interactive Question Answering in Complex Scenarios
Andrew Hickl | John Lehmann | John Williams | Sanda Harabagiu
Proceedings of the Workshop on Pragmatics of Question Answering at HLT-NAACL 2004

pdf
Multi-Document Summarization Using Multiple-Sequence Alignment
V. Finley Lacatusu | Steven J. Maiorano | Sanda M. Harabagiu
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

This paper describes a novel clustering-based text summarization system that uses Multiple Sequence Alignment to improve the alignment of sentences within topic clusters. While most current clustering-based summarization systems base their summaries only on the common information contained in a collection of highly-related sentences, our system constructs more informative summaries that incorporate both the redundant and unique contributions of the sentences in the cluster. When evaluated using ROUGE, the summaries produced by our system represent a substantial improvement over the baseline, which is at 63% of the human performance.

pdf
NameNet: a Self-Improving Resource for Name Classification
Paul Morarescu | Sanda Harabagiu
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf
Incremental Topic Representations
Sanda Harabagiu
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf
Question Answering Based on Semantic Structures
Srini Narayanan | Sanda Harabagiu
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf
COGEX: A Logic Prover for Question Answering
Dan Moldovan | Christine Clark | Sanda Harabagiu | Steve Maiorano
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

pdf
Using Predicate-Argument Structures for Information Extraction
Mihai Surdeanu | Sanda Harabagiu | John Williams | Paul Aarseth
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

2002

pdf
Open-Domain Voice-Activated Question Answering
Sanda Harabagiu | Dan Moldovan | Joe Picone
COLING 2002: The 19th International Conference on Computational Linguistics

pdf
Performance Issues and Error Analysis in an Open-Domain Question Answering System
Dan Moldovan | Marius Pasca | Sanda Harabagiu | Mihai Surdeanu
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

pdf
Multidocument Summarization with GISTexter
Sanda Harabagiu | Finley Lacatusu | Paul Morarescu
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf
Book Reviews: Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval
Sanda Harabagiu
Computational Linguistics, Volume 27, Number 2, June 2001

pdf
Answer Mining from On-Line Documents
Marius Pasca | Sanda Harabagiu
Proceedings of the ACL 2001 Workshop on Open-Domain Question Answering

pdf
Text and Knowledge Mining for Coreference Resolution
Sanda M. Harabagiu | Razvan C. Bunescu | Steven J. Maiorano
Second Meeting of the North American Chapter of the Association for Computational Linguistics

pdf
The Role of Lexico-Semantic Feedback in Open-Domain Textual Question-Answering
Sanda Harabagiu | Dan Moldovan | Marius Pasca | Rada Mihalcea | Mihai Surdeanu | Razvan Bunsecu | Roxana Girju | Vasile Rus | Paul Morarescu
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

2000

pdf
Multilingual Coreference Resolution
Sanda M. Harabagiu | Steven J. Maiorano
Sixth Applied Natural Language Processing Conference

pdf
The Structure and Performance of an Open-Domain Question Answering System
Dan Moldovan | Sanda Harabagiu | Marius Pasca | Rada Mihalcea | Roxana Girju | Richard Goodrum | Vasile Rus
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

pdf
Experiments with Open-Domain Textual Question Answering
Sanda M. Harabagiu | Marius A. Pasca | Steven J. Maiorano
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

pdf
Acquisition of Linguistic Patterns for Knowledge-based Information Extraction
Sanda M. Harabagiu | Steven J. Maiorano
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1999

pdf
Knowledge-Lean Coreference Resolution and its Relation to Textual Cohesion and Coherence
Sanda M. Harabagiu | Steven J. Maiorano
The Relation of Discourse/Dialogue Structure and Reference

pdf
WordNet 2 - A Morphologically and Semantically Enhanced Resource
Sanda M. Harabagiu | George A. Miller | Dan I. Moldovan
SIGLEX99: Standardizing Lexical Resources

1998

pdf
Deriving Metonymic Coercions from WordNet
Sanda M. Harabagiu
Usage of WordNet in Natural Language Processing Systems

1996

pdf
An Application of WordNet to Prepositional Attachment
Sanda M. Harabagiu
34th Annual Meeting of the Association for Computational Linguistics