Jahna Otterbacher

Also published as: Jahna C. Otterbacher

2016

pdf abs
Social and linguistic behavior and its correlation to trait empathy
Marina Litvak | Jahna Otterbacher | Chee Siang Ang | David Atkins
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

A growing body of research exploits social media behaviors to gauge psychological character-istics, though trait empathy has received little attention. Because of its intimate link to the abil-ity to relate to others, our research aims to predict participants’ levels of empathy, given their textual and friending behaviors on Facebook. Using Poisson regression, we compared the vari-ance explained in Davis’ Interpersonal Reactivity Index (IRI) scores on four constructs (em-pathic concern, personal distress, fantasy, perspective taking), by two classes of variables: 1) post content and 2) linguistic style. Our study lays the groundwork for a greater understanding of empathy’s role in facilitating interactions on social media.

2008

pdf abs
Modeling Document Dynamics: an Evolutionary Approach
Jahna Otterbacher | Dragomir Radev
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

News articles about the same event published over time have properties that challenge NLP and IR applications. A cluster of such texts typically exhibits instances of paraphrase and contradiction, as sources update the facts surrounding the story, often due to an ongoing investigation. The current hypothesis is that the stories evolve over time, beginning with the first text published on a given topic. This is tested using a phylogenetic approach as well as one based on language modeling. The fit of the evolutionary models is evaluated with respect to how well they facilitate the recovery of chronological relationships between the documents. Over all data clusters, the language modeling approach consistently outperforms the phylogenetics model. However, on manually collected clusters in which the documents are published within short time spans of one another, both have a similar performance, and produce statistically significant results on the document chronology recovery evaluation.

2005

pdf
Using Random Walks for Question-focused Sentence Retrieval
Jahna Otterbacher | Güneş Erkan | Dragomir Radev
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf
Comparing Semantically Related Sentences: The Case of Paraphrase Versus Subsumption
Jahna Otterbacher | Dragomir Radev
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf abs
RevisionBank: A Resource for Revision-based Multi-document Summarization and Evaluation
Jahna Otterbacher | Dragomir Radev
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Multi-document summaries produced via sentence extraction often suffer from a number of cohesion problems, including dangling anaphora, sudden shifts in topic and incorrect or awkward chronological ordering. Therefore, the development of an automated revision process to correct such problems is a research area of current interest. We present the RevisionBank, a corpus of 240 extractive, multi-document summaries that have been manually revised to promote cohesion. The summaries were revised by six linguistic students using a constrained set of revision operations that we previously developed. In the current paper, we describe the process of developing a taxonomy of cohesion problems and corrective revision operators that address such problems, as well as an annotation schema for our corpus. Finally, we discuss how our taxonomy and corpus can be used for the study of revision-based multi-document summarization as well as for summary evaluation.

pdf abs
CST Bank: A Corpus for the Study of Cross-document Structural Relationships
Dragomir Radev | Jahna Otterbacher | Zhu Zhang
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Clusters of multiple news stories related to the same topic exhibit a number of interesting properties. For example, when documents have been published at various points in time or by different authors or news agencies, one finds many instances of paraphrasing, information overlap and even contradiction. The current paper presents the Cross-document Structure Theory (CST) Bank, a collection of multi-document clusters in which pairs of sentences from different documents have been annotated for cross-document structure theory relationships. We will describe how we built the corpus, including our method for reducing the number of sentence pairs to be annotated by our hired judges, using lexical similarity measures. Finally, we will describe how CST and the CST Bank can be applied to different research areas such as multi-document summarization.