Eduard Hovy

Also published as: Ed Hovy, Eduard H. Hovy

2024

pdf bib abs
Investigating radicalisation indicators in online extremist communities
Christine De Kock | Eduard Hovy
Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024)

We identify and analyse three sociolinguistic indicators of radicalisation within online extremist forums: hostility, longevity and social connectivity. We develop models to predict the maximum degree of each indicator measured over an individual’s lifetime, based on a minimal number of initial interactions. Drawing on data from two diverse extremist communities, our results demonstrate that NLP methods are effective at prioritising at-risk users. This work offers practical insights for intervention strategies and policy development, and highlights an important but under-studied research direction.

pdf abs
Transitive Consistency Constrained Learning for Entity-to-Entity Stance Detection
Haoyang Wen | Eduard Hovy | Alexander Hauptmann
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Entity-to-entity stance detection identifies the stance between a pair of entities with a directed link that indicates the source, target and polarity. It is a streamlined task without the complex dependency structure for structural sentiment analysis, while it is more informative compared to most previous work assuming that the source is the author. Previous work performs entity-to-entity stance detection training on individual entity pairs. However, stances between inter-connected entity pairs may be correlated. In this paper, we propose transitive consistency constrained learning, which first finds connected entity pairs and their stances, and adds an additional objective to enforce the transitive consistency. We explore consistency training on both classification-based and generation-based models and conduct experiments to compare consistency training with previous work and large language models with in-context learning. Experimental results illustrate that the inter-correlation of stances in political news can be used to improve the entity-to-entity stance detection model, while overly strict consistency enforcement may have a negative impact. In addition, we find that large language models struggle with predicting link direction and neutral labels in this task.

pdf abs
A Sentiment Consolidation Framework for Meta-Review Generation
Miao Li | Jey Han Lau | Eduard Hovy
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Modern natural language generation systems with Large Language Models (LLMs) exhibit the capability to generate a plausible summary of multiple documents; however, it is uncertain if they truly possess the capability of information consolidation to generate summaries, especially on documents with opinionated information. We focus on meta-review generation, a form of sentiment summarisation for the scientific domain. To make scientific sentiment summarization more grounded, we hypothesize that human meta-reviewers follow a three-layer framework of sentiment consolidation to write meta-reviews. Based on the framework, we propose novel prompting methods for LLMs to generate meta-reviews and evaluation metrics to assess the quality of generated meta-reviews. Our framework is validated empirically as we find that prompting LLMs based on the framework — compared with prompting them with simple instructions — generates better meta-reviews.

2023

pdf abs
A Textual Dataset for Situated Proactive Response Selection
Naoki Otani | Jun Araki | HyeongSik Kim | Eduard Hovy
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent data-driven conversational models are able to return fluent, consistent, and informative responses to many kinds of requests and utterances in task-oriented scenarios. However, these responses are typically limited to just the immediate local topic instead of being wider-ranging and proactively taking the conversation further, for example making suggestions to help customers achieve their goals. This inadequacy reflects a lack of understanding of the interlocutor’s situation and implicit goal. To address the problem, we introduce a task of proactive response selection based on situational information. We present a manually-curated dataset of 1.7k English conversation examples that include situational background information plus for each conversation a set of responses, only some of which are acceptable in the situation. A responsive and informed conversation system should select the appropriate responses and avoid inappropriate ones; doing so demonstrates the ability to adequately understand the initiating request and situation. Our benchmark experiments show that this is not an easy task even for strong neural models, offering opportunities for future research.

In the last five years, there has been a significant focus in Natural Language Processing (NLP) on developing larger Pretrained Language Models (PLMs) and introducing benchmarks such as SuperGLUE and SQuAD to measure their abilities in language understanding, reasoning, and reading comprehension. These PLMs have achieved impressive results on these benchmarks, even surpassing human performance in some cases. This has led to claims of superhuman capabilities and the provocative idea that certain tasks have been solved. In this position paper, we take a critical look at these claims and ask whether PLMs truly have superhuman abilities and what the current benchmarks are really evaluating. We show that these benchmarks have serious limitations affecting the comparison between humans and PLMs and provide recommendations for fairer and more transparent benchmarks.

pdf abs
Towards Open-Domain Twitter User Profile Inference
Haoyang Wen | Zhenxin Xiao | Eduard Hovy | Alexander Hauptmann
Findings of the Association for Computational Linguistics: ACL 2023

Twitter user profile inference utilizes information from Twitter to predict user attributes (e.g., occupation, location), which is controversial because of its usefulness for downstream applications and its potential to reveal users’ privacy. Therefore, it is important for researchers to determine the extent of profiling in a safe environment to facilitate proper use and make the public aware of the potential risks. Contrary to existing approaches on limited attributes, we explore open-domain Twitter user profile inference. We conduct a case study where we collect publicly available WikiData public figure profiles and use diverse WikiData predicates for profile inference. After removing sensitive attributes, our data contains over 150K public figure profiles from WikiData, over 50 different attribute predicates, and over 700K attribute values. We further propose a prompt-based generation method, which can infer values that are implicitly mentioned in the Twitter information. Experimental results show that the generation-based approach can infer more comprehensive user profiles than baseline extraction-based methods, but limitations still remain to be applied for real-world use. We also enclose a detailed ethical statement for our data, potential benefits and risks from this work, and our efforts to mitigate the risks.

pdf abs
Summarizing Multiple Documents with Conversational Structure for Meta-Review Generation
Miao Li | Eduard Hovy | Jey Lau
Findings of the Association for Computational Linguistics: EMNLP 2023

We present PeerSum, a novel dataset for generating meta-reviews of scientific papers. The meta-reviews can be interpreted as abstractive summaries of reviews, multi-turn discussions and the paper abstract. These source documents have a rich inter-document relationship with an explicit hierarchical conversational structure, cross-references and (occasionally) conflicting information. To introduce the structural inductive bias into pre-trained language models, we introduce RAMMER (Relationship-aware Multi-task Meta-review Generator), a model that uses sparse attention based on the conversational structure and a multi-task training objective that predicts metadata features (e.g., review ratings). Our experimental results show that RAMMER outperforms other strong baseline models in terms of a suite of automatic evaluation metrics. Further analyses, however, reveal that RAMMER and other models struggle to handle conflicts in source documents, suggesting meta-review generation is a challenging task and a promising avenue for further research.

The recent explosion of performance of large language models (LLMs) has changed the field of Natural Language Processing (NLP) more abruptly and seismically than any other shift in the field’s 80 year history. This has resulted in concerns that the field will become homogenized and resource-intensive. This new status quo has put many academic researchers, especially PhD students, at a disadvantage. This paper aims to define a new NLP playground by proposing 20+ PhD-dissertation-worthy research directions, covering theoretical analysis, new and challenging problems, learning paradigms and interdisciplinary applications.

pdf abs
Data-efficient Active Learning for Structured Prediction with Partial Annotation and Self-Training
Zhisong Zhang | Emma Strubell | Eduard Hovy
Findings of the Association for Computational Linguistics: EMNLP 2023

In this work we propose a pragmatic method that reduces the annotation cost for structured label spaces using active learning. Our approach leverages partial annotation, which reduces labeling costs for structured outputs by selecting only the most informative sub-structures for annotation. We also utilize self-training to incorporate the current model’s automatic predictions as pseudo-labels for un-annotated sub-structures. A key challenge in effectively combining partial annotation with self-training to reduce annotation cost is determining which sub-structures to select to label. To address this challenge, we adopt an error estimator to adaptively decide the partial selection ratio according to the current model’s capability. In evaluations spanning four structured prediction tasks, we show that our combination of partial annotation and self-training using an adaptive selection ratio reduces annotation cost over strong full annotation baselines under a fair comparison scheme that takes reading time into consideration.

pdf abs
CHARD: Clinical Health-Aware Reasoning Across Dimensions for Text Generation Models
Steven Y. Feng | Vivek Khetan | Bogdan Sacaleanu | Anatole Gershman | Eduard Hovy
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

We motivate and introduce CHARD: Clinical Health-Aware Reasoning across Dimensions, to investigate the capability of text generation models to act as implicit clinical knowledge bases and generate free-flow textual explanations about various health-related conditions across several dimensions. We collect and present an associated dataset, CHARDat, consisting of explanations about 52 health conditions across three clinical dimensions. We conduct extensive experiments using BART and T5 along with data augmentation, and perform automatic, human, and qualitative analyses. We show that while our models can perform decently, CHARD is very challenging with strong potential for further exploration.

pdf abs
PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically
Sedrick Scott Keh | Steven Y. Feng | Varun Gangal | Malihe Alikhani | Eduard Hovy
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Tongue twisters are meaningful sentences that are difficult to pronounce. The process of automatically generating tongue twisters is challenging since the generated utterance must satisfy two conditions at once: phonetic difficulty and semantic meaning. Furthermore, phonetic difficulty is itself hard to characterize and is expressed in natural tongue twisters through a heterogeneous mix of phenomena such as alliteration and homophony. In this paper, we propose PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically. We leverage phoneme representations to capture the notion of phonetic difficulty, and we train language models to generate original tongue twisters on two proposed task settings. To do this, we curate a dataset called TT-Corp, consisting of existing English tongue twisters. Through automatic and human evaluation, as well as qualitative analysis, we show that PANCETTA generates novel, phonetically difficult, fluent, and semantically meaningful tongue twisters.

pdf bib abs
On the Underspecification of Situations in Open-domain Conversational Datasets
Naoki Otani | Jun Araki | HyeongSik Kim | Eduard Hovy
Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023)

Advances of open-domain conversational systems have been achieved through the creation of numerous conversation datasets. However, many of the commonly used datasets contain little or no information about the conversational situation, such as relevant objects/people, their properties, and relationships. This absence leads to underspecification of the problem space and typically results in undesired dialogue system behavior. This position paper discusses the current state of the field associated with processing situational information. An analysis of response generation using three datasets shows that explicitly provided situational information can improve the coherence and specificity of generated responses, but further experiments reveal that generation systems can be misled by irrelevant information. Our conclusions from this evaluation provide insights into the problem and directions for future research.

Data augmentation is an important method for evaluating the robustness of and enhancing the diversity of training data for natural language processing (NLP) models. In this paper, we present NL-Augmenter, a new participatory Python-based natural language (NL) augmentation framework which supports the creation of transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of NL tasks annotated with noisy descriptive tags. The transformations incorporate noise, intentional and accidental human mistakes, socio-linguistic variation, semantically-valid style, syntax changes, as well as artificial constructs that are unambiguous to humans. We demonstrate the efficacy of NL-Augmenter by using its transformations to analyze the robustness of popular language models. We find different models to be differently challenged on different tasks, with quasi-systematic score decreases. The infrastructure, datacards, and robustness evaluation results are publicly available on GitHub for the benefit of researchers working on paraphrase generation, robustness analysis, and low-resource NLP.

pdf
On the Interactions of Structural Constraints and Data Resources for Structured Prediction
Zhisong Zhang | Emma Strubell | Eduard Hovy
Proceedings of The Fourth Workshop on Simple and Efficient Natural Language Processing (SustaiNLP)

2022

pdf abs
EvEntS ReaLM: Event Reasoning of Entity States via Language Models
Evangelia Spiliopoulou | Artidoro Pagnoni | Yonatan Bisk | Eduard Hovy
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

This paper investigates models of event implications. Specifically, how well models predict entity state-changes, by targeting their understanding of physical attributes. Nominally, Large Language models (LLM) have been exposed to procedural knowledge about how objects interact, yet our benchmarking shows they fail to reason about the world. Conversely, we also demonstrate that existing approaches often misrepresent the surprising abilities of LLMs via improper task encodings and that proper model prompting can dramatically improve performance of reported baseline results across multiple tasks. In particular, our results indicate that our prompting technique is especially useful for unseen attributes (out-of-domain) or when only limited data is available.

pdf abs
Transfer Learning from Semantic Role Labeling to Event Argument Extraction with Template-based Slot Querying
Zhisong Zhang | Emma Strubell | Eduard Hovy
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

In this work, we investigate transfer learning from semantic role labeling (SRL) to event argument extraction (EAE), considering their similar argument structures. We view the extraction task as a role querying problem, unifying various methods into a single framework. There are key discrepancies on role labels and distant arguments between semantic role and event argument annotations. To mitigate these discrepancies, we specify natural language-like queries to tackle the label mismatch problem and devise argument augmentation to recover distant arguments. We show that SRL annotations can serve as a valuable resource for EAE, and a template-based slot querying strategy is especially effective for facilitating the transfer. In extensive evaluations on two English EAE benchmarks, our proposed model obtains impressive zero-shot results by leveraging SRL annotations, reaching nearly 80% of the fullysupervised scores. It further provides benefits in low-resource cases, where few EAE annotations are available. Moreover, we show that our approach generalizes to cross-domain and multilingual scenarios.

Claim detection and verification are crucial for news understanding and have emerged as promising technologies for mitigating misinformation and disinformation in the news. However, most existing work has focused on claim sentence analysis while overlooking additional crucial attributes (e.g., the claimer and the main object associated with the claim).In this work, we present NewsClaims, a new benchmark for attribute-aware claim detection in the news domain. We extend the claim detection problem to include extraction of additional attributes related to each claim and release 889 claims annotated over 143 news articles. NewsClaims aims to benchmark claim detection systems in emerging scenarios, comprising unseen topics with little or no training data. To this end, we see that zero-shot and prompt-based baselines show promising performance on this benchmark, while still considerably behind human performance.

pdf abs
A Survey of Active Learning for Natural Language Processing
Zhisong Zhang | Emma Strubell | Eduard Hovy
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

In this work, we provide a literature review of active learning (AL) for its applications in natural language processing (NLP). In addition to a fine-grained categorization of query strategies, we also investigate several other important aspects of applying AL to NLP problems. These include AL for structured prediction tasks, annotation cost, model learning (especially with deep neural models), and starting and stopping AL. Finally, we conclude with a discussion of related topics and future directions.

pdf abs
One Document, Many Revisions: A Dataset for Classification and Description of Edit Intents
Dheeraj Rajagopal | Xuchao Zhang | Michael Gamon | Sujay Kumar Jauhar | Diyi Yang | Eduard Hovy
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Document authoring involves a lengthy revision process, marked by individual edits that are frequently linked to comments. Modeling the relationship between edits and comments leads to a better understanding of document evolution, potentially benefiting applications such as content summarization, and task triaging. Prior work on understanding revisions has primarily focused on classifying edit intents, but falling short of a deeper understanding of the nature of these edits. In this paper, we present explore the challenge of describing an edit at two levels: identifying the edit intent, and describing the edit using free-form text. We begin by defining a taxonomy of general edit intents and introduce a new dataset of full revision histories of Wikipedia pages, annotated with each revision’s edit intent. Using this dataset, we train a classifier that achieves a 90% accuracy in identifying edit intent. We use this classifier to train a distantly-supervised model that generates a high-level description of a revision in free-form text. Our experimental results show that incorporating edit intent information aids in generating better edit descriptions. We establish a set of baselines for the edit description task, achieving a best score of 28 ROUGE, thus demonstrating the effectiveness of our layered approach to edit understanding.

pdf abs
PINEAPPLE: Personifying INanimate Entities by Acquiring Parallel Personification Data for Learning Enhanced Generation
Sedrick Scott Keh | Kevin Lu | Varun Gangal | Steven Y. Feng | Harsh Jhamtani | Malihe Alikhani | Eduard Hovy
Proceedings of the 29th International Conference on Computational Linguistics

A personification is a figure of speech that endows inanimate entities with properties and actions typically seen as requiring animacy. In this paper, we explore the task of personification generation. To this end, we propose PINEAPPLE: Personifying INanimate Entities by Acquiring Parallel Personification data for Learning Enhanced generation. We curate a corpus of personifications called PersonifCorp, together with automatically generated de-personified literalizations of these personifications. We demonstrate the usefulness of this parallel corpus by training a seq2seq model to personify a given literal input. Both automatic and human evaluations show that fine-tuning with PersonifCorp leads to significant gains in personification-related qualities such as animacy and interestingness. A detailed qualitative analysis also highlights key strengths and imperfections of PINEAPPLE over baselines, demonstrating a strong ability to generate diverse and creative personifications that enhance the overall appeal of a sentence.

2021

pdf abs
Comparative Error Analysis in Neural and Finite-state Models for Unsupervised Character-level Transduction
Maria Ryskina | Eduard Hovy | Taylor Berg-Kirkpatrick | Matthew R. Gormley
Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

Traditionally, character-level transduction problems have been solved with finite-state models designed to encode structural and linguistic knowledge of the underlying process, whereas recent approaches rely on the power and flexibility of sequence-to-sequence models with attention. Focusing on the less explored unsupervised learning scenario, we compare the two model classes side by side and find that they tend to make different types of errors even when achieving comparable performance. We analyze the distributions of different error classes using two unsupervised tasks as testbeds: converting informally romanized text into the native script of its language (for Russian, Arabic, and Kannada) and translating between a pair of closely related languages (Serbian and Bosnian). Finally, we investigate how combining finite-state and sequence-to-sequence models at decoding time affects the output quantitatively and qualitatively.

pdf abs
More Identifiable yet Equally Performant Transformers for Text Classification
Rishabh Bhardwaj | Navonil Majumder | Soujanya Poria | Eduard Hovy
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Interpretability is an important aspect of the trustworthiness of a model’s predictions. Transformer’s predictions are widely explained by the attention weights, i.e., a probability distribution generated at its self-attention unit (head). Current empirical studies provide shreds of evidence that attention weights are not explanations by proving that they are not unique. A recent study showed theoretical justifications to this observation by proving the non-identifiability of attention weights. For a given input to a head and its output, if the attention weights generated in it are unique, we call the weights identifiable. In this work, we provide deeper theoretical analysis and empirical observations on the identifiability of attention weights. Ignored in the previous works, we find the attention weights are more identifiable than we currently perceive by uncovering the hidden role of the key vector. However, the weights are still prone to be non-unique attentions that make them unfit for interpretation. To tackle this issue, we provide a variant of the encoder layer that decouples the relationship between key and value vector and provides identifiable weights up to the desired length of the input. We prove the applicability of such variations by providing empirical justifications on varied text classification tasks. The implementations are available at https://github.com/declare-lab/identifiable-transformers.

pdf abs
Style is NOT a single variable: Case Studies for Cross-Stylistic Language Understanding
Dongyeop Kang | Eduard Hovy
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Every natural text is written in some style. Style is formed by a complex combination of different stylistic factors, including formality markers, emotions, metaphors, etc. One cannot form a complete understanding of a text without considering these factors. The factors combine and co-vary in complex ways to form styles. Studying the nature of the covarying combinations sheds light on stylistic language in general, sometimes called cross-style language understanding. This paper provides the benchmark corpus (XSLUE) that combines existing datasets and collects a new one for sentence-level cross-style language understanding and evaluation. The benchmark contains text in 15 different styles under the proposed four theoretical groupings: figurative, personal, affective, and interpersonal groups. For valid evaluation, we collect an additional diagnostic set by annotating all 15 styles on the same text. Using XSLUE, we propose three interesting cross-style applications in classification, correlation, and generation. First, our proposed cross-style classifier trained with multiple styles together helps improve overall classification performance against individually-trained style classifiers. Second, our study shows that some styles are highly dependent on each other in human-written text. Finally, we find that combinations of some contradictive styles likely generate stylistically less appropriate text. We believe our benchmark and case studies help explore interesting future directions for cross-style research. The preprocessed datasets and code are publicly available.

pdf abs
Dual Graph Convolutional Networks for Aspect-based Sentiment Analysis
Ruifan Li | Hao Chen | Fangxiang Feng | Zhanyu Ma | Xiaojie Wang | Eduard Hovy
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Aspect-based sentiment analysis is a fine-grained sentiment classification task. Recently, graph neural networks over dependency trees have been explored to explicitly model connections between aspects and opinion words. However, the improvement is limited due to the inaccuracy of the dependency parsing results and the informal expressions and complexity of online reviews. To overcome these challenges, in this paper, we propose a dual graph convolutional networks (DualGCN) model that considers the complementarity of syntax structures and semantic correlations simultaneously. Particularly, to alleviate dependency parsing errors, we design a SynGCN module with rich syntactic knowledge. To capture semantic correlations, we design a SemGCN module with self-attention mechanism. Furthermore, we propose orthogonal and differential regularizers to capture semantic correlations between words precisely by constraining attention scores in the SemGCN module. The orthogonal regularizer encourages the SemGCN to learn semantically correlated words with less overlap for each word. The differential regularizer encourages the SemGCN to learn semantic features that the SynGCN fails to capture. Experimental results on three public datasets show that our DualGCN model outperforms state-of-the-art methods and verify the effectiveness of our model.

pdf abs
Classifying Argumentative Relations Using Logical Mechanisms and Argumentation Schemes
Yohan Jo | Seojin Bang | Chris Reed | Eduard Hovy
Transactions of the Association for Computational Linguistics, Volume 9

While argument mining has achieved significant success in classifying argumentative relations between statements (support, attack, and neutral), we have a limited computational understanding of logical mechanisms that constitute those relations. Most recent studies rely on black-box models, which are not as linguistically insightful as desired. On the other hand, earlier studies use rather simple lexical features, missing logical relations between statements. To overcome these limitations, our work classifies argumentative relations based on four logical and theory-informed mechanisms between two statements, namely, (i) factual consistency, (ii) sentiment coherence, (iii) causal relation, and (iv) normative relation. We demonstrate that our operationalization of these logical mechanisms classifies argumentative relations without directly training on data labeled with the relations, significantly better than several unsupervised baselines. We further demonstrate that these mechanisms also improve supervised classifiers through representation learning.

Consistency of a model—that is, the invariance of its behavior under meaning-preserving alternations in its input—is a highly desirable property in natural language processing. In this paper we study the question: Are Pretrained Language Models (PLMs) consistent with respect to factual knowledge? To this end, we create ParaRel🤘, a high-quality resource of cloze-style query English paraphrases. It contains a total of 328 paraphrases for 38 relations. Using ParaRel🤘, we show that the consistency of all PLMs we experiment with is poor— though with high variance between relations. Our analysis of the representational spaces of PLMs suggests that they have a poor structure and are currently not suitable for representing knowledge robustly. Finally, we propose a method for improving model consistency and experimentally demonstrate its effectiveness.1

During production of this paper, an error was introduced to the formula on the bottom of the right column of page 1020. In the last two terms of the formula, the n and m subscripts were swapped. The correct formula is:Lc=∑n=1k∑m=n+1kDKL(Qnri∥Qmri)+DKL(Qmri∥Qnri)The paper has been updated.

pdf abs
StylePTB: A Compositional Benchmark for Fine-grained Controllable Text Style Transfer
Yiwei Lyu | Paul Pu Liang | Hai Pham | Eduard Hovy | Barnabás Póczos | Ruslan Salakhutdinov | Louis-Philippe Morency
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Text style transfer aims to controllably generate text with targeted stylistic changes while maintaining core meaning from the source sentence constant. Many of the existing style transfer benchmarks primarily focus on individual high-level semantic changes (e.g. positive to negative), which enable controllability at a high level but do not offer fine-grained control involving sentence structure, emphasis, and content of the sentence. In this paper, we introduce a large-scale benchmark, StylePTB, with (1) paired sentences undergoing 21 fine-grained stylistic changes spanning atomic lexical, syntactic, semantic, and thematic transfers of text, as well as (2) compositions of multiple transfers which allow modeling of fine-grained stylistic changes as building blocks for more complex, high-level transfers. By benchmarking existing methods on StylePTB, we find that they struggle to model fine-grained changes and have an even more difficult time composing multiple styles. As a result, StylePTB brings novel challenges that we hope will encourage future research in controllable text style transfer, compositional models, and learning disentangled representations. Solving these challenges would present important steps towards controllable text generation.

pdf abs
Comparing Span Extraction Methods for Semantic Role Labeling
Zhisong Zhang | Emma Strubell | Eduard Hovy
Proceedings of the 5th Workshop on Structured Prediction for NLP (SPNLP 2021)

In this work, we empirically compare span extraction methods for the task of semantic role labeling (SRL). While recent progress incorporating pre-trained contextualized representations into neural encoders has greatly improved SRL F1 performance on popular benchmarks, the potential costs and benefits of structured decoding in these models have become less clear. With extensive experiments on PropBank SRL datasets, we find that more structured decoding methods outperform BIO-tagging when using static (word type) embeddings across all experimental settings. However, when used in conjunction with pre-trained contextualized word representations, the benefits are diminished. We also experiment in cross-genre and cross-lingual settings and find similar trends. We further perform speed comparisons and provide analysis on the accuracy-efficiency trade-offs among different decoding methods.

pdf abs
NoiseQA: Challenge Set Evaluation for User-Centric Question Answering
Abhilasha Ravichander | Siddharth Dalmia | Maria Ryskina | Florian Metze | Eduard Hovy | Alan W Black
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

When Question-Answering (QA) systems are deployed in the real world, users query them through a variety of interfaces, such as speaking to voice assistants, typing questions into a search engine, or even translating questions to languages supported by the QA system. While there has been significant community attention devoted to identifying correct answers in passages assuming a perfectly formed question, we show that components in the pipeline that precede an answering engine can introduce varied and considerable sources of error, and performance can degrade substantially based on these upstream noise sources even for powerful pre-trained QA models. We conclude that there is substantial room for progress before QA systems can be effectively deployed, highlight the need for QA evaluation to expand to consider real-world use, and hope that our findings will spur greater community interest in the issues that arise when our systems actually need to be of utility to humans.

pdf abs
Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?
Abhilasha Ravichander | Yonatan Belinkov | Eduard Hovy
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Although neural models have achieved impressive results on several NLP benchmarks, little is understood about the mechanisms they use to perform language tasks. Thus, much recent attention has been devoted to analyzing the sentence representations learned by neural encoders, through the lens of ‘probing’ tasks. However, to what extent was the information encoded in sentence representations, as discovered through a probe, actually used by the model to perform its task? In this work, we examine this probing paradigm through a case study in Natural Language Inference, showing that models can learn to encode linguistic properties even if they are not needed for the task on which the model was trained. We further identify that pretrained word embeddings play a considerable role in encoding these properties rather than the training task itself, highlighting the importance of careful controls when designing probing experiments. Finally, through a set of controlled synthetic tasks, we demonstrate models can encode these properties considerably above chance-level, even when distributed in the data as random noise, calling into question the interpretation of absolute claims on probing tasks.

pdf abs
SAPPHIRE: Approaches for Enhanced Concept-to-Text Generation
Steven Y. Feng | Jessica Huynh | Chaitanya Prasad Narisetty | Eduard Hovy | Varun Gangal
Proceedings of the 14th International Conference on Natural Language Generation

We motivate and propose a suite of simple but effective improvements for concept-to-text generation called SAPPHIRE: Set Augmentation and Post-hoc PHrase Infilling and REcombination. We demonstrate their effectiveness on generative commonsense reasoning, a.k.a. the CommonGen task, through experiments using both BART and T5 models. Through extensive automatic and human evaluation, we show that SAPPHIRE noticeably improves model performance. An in-depth qualitative analysis illustrates that SAPPHIRE effectively addresses many issues of the baseline model generations, including lack of commonsense, insufficient specificity, and poor fluency.

pdf
Improving Automated Evaluation of Open Domain Dialog via Diverse Reference Augmentation
Varun Gangal | Harsh Jhamtani | Eduard Hovy | Taylor Berg-Kirkpatrick
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf
Could you give me a hint ? Generating inference graphs for defeasible reasoning
Aman Madaan | Dheeraj Rajagopal | Niket Tandon | Yiming Yang | Eduard Hovy
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Finding counterevidence to statements is key to many tasks, including counterargument generation. We build a system that, given a statement, retrieves counterevidence from diverse sources on the Web. At the core of this system is a natural language inference (NLI) model that determines whether a candidate sentence is valid counterevidence or not. Most NLI models to date, however, lack proper reasoning abilities necessary to find counterevidence that involves complex inference. Thus, we present a knowledge-enhanced NLI model that aims to handle causality- and example-based inference by incorporating knowledge graphs. Our NLI model outperforms baselines for NLI tasks, especially for instances that require the targeted inference. In addition, this NLI model further improves the counterevidence retrieval system, notably finding complex counterevidence better.

pdf abs
On the Benefit of Syntactic Supervision for Cross-lingual Transfer in Semantic Role Labeling
Zhisong Zhang | Emma Strubell | Eduard Hovy
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Although recent developments in neural architectures and pre-trained representations have greatly increased state-of-the-art model performance on fully-supervised semantic role labeling (SRL), the task remains challenging for languages where supervised SRL training data are not abundant. Cross-lingual learning can improve performance in this setting by transferring knowledge from high-resource languages to low-resource ones. Moreover, we hypothesize that annotations of syntactic dependencies can be leveraged to further facilitate cross-lingual transfer. In this work, we perform an empirical exploration of the helpfulness of syntactic supervision for crosslingual SRL within a simple multitask learning scheme. With comprehensive evaluations across ten languages (in addition to English) and three SRL benchmark datasets, including both dependency- and span-based SRL, we show the effectiveness of syntactic supervision in low-resource scenarios.

pdf abs
Think about it! Improving defeasible reasoning by first modeling the question scenario.
Aman Madaan | Niket Tandon | Dheeraj Rajagopal | Peter Clark | Yiming Yang | Eduard Hovy
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Defeasible reasoning is the mode of reasoning where conclusions can be overturned by taking into account new evidence. Existing cognitive science literature on defeasible reasoning suggests that a person forms a “mental model” of the problem scenario before answering questions. Our research goal asks whether neural models can similarly benefit from envisioning the question scenario before answering a defeasible query. Our approach is, given a question, to have a model first create a graph of relevant influences, and then leverage that graph as an additional input when answering the question. Our system, CURIOUS, achieves a new state-of-the-art on three different defeasible reasoning datasets. This result is significant as it illustrates that performance can be improved by guiding a system to “think about” a question and explicitly model the scenario, rather than answering reflexively.

pdf abs
Investigating Robustness of Dialog Models to Popular Figurative Language Constructs
Harsh Jhamtani | Varun Gangal | Eduard Hovy | Taylor Berg-Kirkpatrick
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Humans often employ figurative language use in communication, including during interactions with dialog systems. Thus, it is important for real-world dialog systems to be able to handle popular figurative language constructs like metaphor and simile. In this work, we analyze the performance of existing dialog models in situations where the input dialog context exhibits use of figurative language. We observe large gaps in handling of figurative language when evaluating the models on two open domain dialog datasets. When faced with dialog contexts consisting of figurative language, some models show very large drops in performance compared to contexts without figurative language. We encourage future research in dialog modeling to separately analyze and report results on figurative language in order to better test model capabilities relevant to real-world use. Finally, we propose lightweight solutions to help existing models become more robust to figurative language by simply using an external resource to translate figurative language to literal (non-figurative) forms while preserving the meaning to the best extent possible.

2020

pdf abs
On the Systematicity of Probing Contextualized Word Representations: The Case of Hypernymy in BERT
Abhilasha Ravichander | Eduard Hovy | Kaheer Suleman | Adam Trischler | Jackie Chi Kit Cheung
Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics

Contextualized word representations have become a driving force in NLP, motivating widespread interest in understanding their capabilities and the mechanisms by which they operate. Particularly intriguing is their ability to identify and encode conceptual abstractions. Past work has probed BERT representations for this competence, finding that BERT can correctly retrieve noun hypernyms in cloze tasks. In this work, we ask the question: do probing studies shed light on systematic knowledge in BERT representations? As a case study, we examine hypernymy knowledge encoded in BERT representations. In particular, we demonstrate through a simple consistency probe that the ability to correctly retrieve hypernyms in cloze tasks, as used in prior work, does not correspond to systematic knowledge in BERT. Our main conclusion is cautionary: even if BERT demonstrates high probing accuracy for a particular competence, it does not necessarily follow that BERT ‘understands’ a concept, and it cannot be expected to systematically generalize across applicable contexts.

Next to keeping up with the growing literature in their own and related fields, scholars increasingly also need to rebut pseudo-science and disinformation. To address these challenges, computational work on enhancing search, summarization, and analysis of scholarly documents has flourished. However, the various strands of research on scholarly document processing remain fragmented. To reach to the broader NLP and AI/ML community, pool distributed efforts and enable shared access to published research, we held the 1st Workshop on Scholarly Document Processing at EMNLP 2020 as a virtual event. The SDP workshop consisted of a research track (including a poster session), two invited talks and three Shared Tasks (CL-SciSumm, Lay-Summ and LongSumm), geared towards easier access to scientific methods and results. Website: https://ornlcda.github.io/SDProc

pdf abs
Overview and Insights from the Shared Tasks at Scholarly Document Processing 2020: CL-SciSumm, LaySumm and LongSumm
Muthu Kumar Chandrasekaran | Guy Feigenblat | Eduard Hovy | Abhilasha Ravichander | Michal Shmueli-Scheuer | Anita de Waard
Proceedings of the First Workshop on Scholarly Document Processing

We present the results of three Shared Tasks held at the Scholarly Document Processing Workshop at EMNLP2020: CL-SciSumm, LaySumm and LongSumm. We report on each of the tasks, which received 18 submissions in total, with some submissions addressing two or three of the tasks. In summary, the quality and quantity of the submissions show that there is ample interest in scholarly document summarization, and the state of the art in this domain is at a midway point between being an impossible task and one that is fully resolved.

pdf abs
Definition Frames: Using Definitions for Hybrid Concept Representations
Evangelia Spiliopoulou | Artidoro Pagnoni | Eduard Hovy
Proceedings of the 28th International Conference on Computational Linguistics

Advances in word representations have shown tremendous improvements in downstream NLP tasks, but lack semantic interpretability. In this paper, we introduce Definition Frames (DF), a matrix distributed representation extracted from definitions, where each dimension is semantically interpretable. DF dimensions correspond to the Qualia structure relations: a set of relations that uniquely define a term. Our results show that DFs have competitive performance with other distributional semantic approaches on word similarity tasks.

pdf abs
Machine-Aided Annotation for Fine-Grained Proposition Types in Argumentation
Yohan Jo | Elijah Mayfield | Chris Reed | Eduard Hovy
Proceedings of the Twelfth Language Resources and Evaluation Conference

We introduce a corpus of the 2016 U.S. presidential debates and commentary, containing 4,648 argumentative propositions annotated with fine-grained proposition types. Modern machine learning pipelines for analyzing argument have difficulty distinguishing between types of propositions based on their factuality, rhetorical positioning, and speaker commitment. Inability to properly account for these facets leaves such systems inaccurate in understanding of fine-grained proposition types. In this paper, we demonstrate an approach to annotating for four complex proposition types, namely normative claims, desires, future possibility, and reported speech. We develop a hybrid machine learning and human workflow for annotation that allows for efficient and reliable annotation of complex linguistic phenomena, and demonstrate with preliminary analysis of rhetorical strategies and structure in presidential debates. This new dataset and method can support technical researchers seeking more nuanced representations of argument, as well as argumentation theorists developing new quantitative analyses.

pdf abs
Measuring Forecasting Skill from Text
Shi Zong | Alan Ritter | Eduard Hovy
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

People vary in their ability to make accurate predictions about the future. Prior studies have shown that some individuals can predict the outcome of future events with consistently better accuracy. This leads to a natural question: what makes some forecasters better than others? In this paper we explore connections between the language people use to describe their predictions and their forecasting skill. Datasets from two different forecasting domains are explored: (1) geopolitical forecasts from Good Judgment Open, an online prediction forum and (2) a corpus of company earnings forecasts made by financial analysts. We present a number of linguistic metrics which are computed over text associated with people’s predictions about the future including: uncertainty, readability, and emotion. By studying linguistic factors associated with predictions, we are able to shed some light on the approach taken by skilled forecasters. Furthermore, we demonstrate that it is possible to accurately predict forecasting skill using a model that is based solely on language. This could potentially be useful for identifying accurate predictions or potentially skilled forecasters earlier.

pdf abs
SCDE: Sentence Cloze Dataset with High Quality Distractors From Examinations
Xiang Kong | Varun Gangal | Eduard Hovy
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We introduce SCDE, a dataset to evaluate the performance of computational models through sentence prediction. SCDE is a human created sentence cloze dataset, collected from public school English examinations. Our task requires a model to fill up multiple blanks in a passage from a shared candidate set with distractors designed by English teachers. Experimental results demonstrate that this task requires the use of non-local, discourse-level context beyond the immediate sentence neighborhood. The blanks require joint solving and significantly impair each other’s context. Furthermore, through ablations, we show that the distractors are of high quality and make the task more challenging. Our experiments show that there is a significant performance gap between advanced models (72%) and humans (87%), encouraging future models to bridge this gap.

pdf abs
A Two-Step Approach for Implicit Event Argument Detection
Zhisong Zhang | Xiang Kong | Zhengzhong Liu | Xuezhe Ma | Eduard Hovy
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In this work, we explore the implicit event argument detection task, which studies event arguments beyond sentence boundaries. The addition of cross-sentence argument candidates imposes great challenges for modeling. To reduce the number of candidates, we adopt a two-step approach, decomposing the problem into two sub-problems: argument head-word detection and head-to-span expansion. Evaluated on the recent RAMS dataset (Ebner et al., 2020), our model achieves overall better performance than a strong sequence labeling baseline. We further provide detailed error analysis, presenting where the model mainly makes errors and indicating directions for future improvements. It remains a challenge to detect implicit arguments, calling for more future work of document-level modeling for this task.

pdf bib
Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems
Steffen Eger | Yang Gao | Maxime Peyrard | Wei Zhao | Eduard Hovy
Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems

pdf abs
Nested Named Entity Recognition via Second-best Sequence Learning and Decoding
Takashi Shibuya | Eduard Hovy
Transactions of the Association for Computational Linguistics, Volume 8

When an entity name contains other names within it, the identification of all combinations of names can become difficult and expensive. We propose a new method to recognize not only outermost named entities but also inner nested ones. We design an objective function for training a neural model that treats the tag sequence for nested entities as the second best path within the span of their parent entity. In addition, we provide the decoding method for inference that extracts entities iteratively from outermost ones to inner ones in an outside-to-inside way. Our method has no additional hyperparameters to the conditional random field based model widely used for flat named entity recognition tasks. Experiments demonstrate that our method performs better than or at least as well as existing methods capable of handling nested entities, achieving F1-scores of 85.82%, 84.34%, and 77.36% on ACE-2004, ACE-2005, and GENIA datasets, respectively.

pdf abs
GenAug: Data Augmentation for Finetuning Text Generators
Steven Y. Feng | Varun Gangal | Dongyeop Kang | Teruko Mitamura | Eduard Hovy
Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

In this paper, we investigate data augmentation for text generation, which we call GenAug. Text generation and language modeling are important tasks within natural language processing, and are especially challenging for low-data regimes. We propose and evaluate various augmentation methods, including some that incorporate external knowledge, for finetuning GPT-2 on a subset of Yelp Reviews. We also examine the relationship between the amount of augmentation and the quality of the generated text. We utilize several metrics that evaluate important aspects of the generated text including its diversity and fluency. Our experiments demonstrate that insertion of character-level synthetic noise and keyword replacement with hypernyms are effective augmentation methods, and that the quality of generations improves to a peak at approximately three times the amount of original data.

pdf abs
An Empirical Exploration of Local Ordering Pre-training for Structured Prediction
Zhisong Zhang | Xiang Kong | Lori Levin | Eduard Hovy
Findings of the Association for Computational Linguistics: EMNLP 2020

Recently, pre-training contextualized encoders with language model (LM) objectives has been shown an effective semi-supervised method for structured prediction. In this work, we empirically explore an alternative pre-training method for contextualized encoders. Instead of predicting words in LMs, we “mask out” and predict word order information, with a local ordering strategy and word-selecting objectives. With evaluations on three typical structured prediction tasks (dependency parsing, POS tagging, and NER) over four languages (English, Finnish, Czech, and Italian), we show that our method is consistently beneficial. We further conduct detailed error analysis, including one that examines a specific type of parsing error where the head is misidentified. The results show that pre-trained contextual encoders can bring improvements in a structured way, suggesting that they may be able to capture higher-order patterns and feature combinations from unlabeled data.

pdf abs
What-if I ask you to explain: Explaining the effects of perturbations in procedural text
Dheeraj Rajagopal | Niket Tandon | Peter Clark | Bhavana Dalvi | Eduard Hovy
Findings of the Association for Computational Linguistics: EMNLP 2020

Our goal is to explain the effects of perturbations in procedural text, e.g., given a passage describing a rabbit’s life cycle, explain why illness (the perturbation) may reduce the rabbit population (the effect). Although modern systems are able to solve the original prediction task well (e.g., illness results in less rabbits), the explanation task - identifying the causal chain of events from perturbation to effect - remains largely unaddressed, and is the goal of this research. We present QUARTET, a system that constructs such explanations from paragraphs, by modeling the explanation task as a multitask learning problem. QUARTET constructs explanations from the sentences in the procedural text, achieving ~18 points better on explanation accuracy compared to several strong baselines on a recent process comprehension benchmark. On an end task on this benchmark, we show a surprising finding that good explanations do not have to come at the expense of end task performance, in fact leading to a 7% F1 improvement over SOTA.

pdf abs
Event-Related Bias Removal for Real-time Disaster Events
Salvador Medina Maza | Evangelia Spiliopoulou | Eduard Hovy | Alexander Hauptmann
Findings of the Association for Computational Linguistics: EMNLP 2020

Social media has become an important tool to share information about crisis events such as natural disasters and mass attacks. Detecting actionable posts that contain useful information requires rapid analysis of huge volumes of data in real-time. This poses a complex problem due to the large amount of posts that do not contain any actionable information. Furthermore, the classification of information in real-time systems requires training on out-of-domain data, as we do not have any data from a new emerging crisis. Prior work focuses on models pre-trained on similar event types. However, those models capture unnecessary event-specific biases, like the location of the event, which affect the generalizability and performance of the classifiers on new unseen data from an emerging new event. In our work, we train an adversarial neural model to remove latent event-specific biases and improve the performance on tweet importance classification.

pdf bib abs
BERTering RAMS: What and How Much does BERT Already Know About Event Arguments? - A Study on the RAMS Dataset
Varun Gangal | Eduard Hovy
Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Using the attention map based probing framework from (Clark et al., 2019), we observe that, on the RAMS dataset (Ebner et al., 2020), BERT’s attention heads have modest but well above-chance ability to spot event arguments sans any training or domain finetuning, varying from a low of 17.77% for Place to a high of 51.61% for Artifact. Next, we find that linear combinations of these heads, estimated with approx. 11% of available total event argument detection supervision, can push performance well higher for some roles — highest two being Victim (68.29% Accuracy) and Artifact (58.82% Accuracy). Furthermore, we investigate how well our methods do for cross-sentence event arguments. We propose a procedure to isolate “best heads” for cross-sentence argument detection separately of those for intra-sentence arguments. The heads thus estimated have superior cross-sentence performance compared to their jointly estimated equivalents, albeit only under the unrealistic assumption that we already know the argument is present in another sentence. Lastly, we seek to isolate to what extent our numbers stem from lexical frequency based associations between gold arguments and roles. We propose NONCE, a scheme to create adversarial test examples by replacing gold arguments with randomly generated “nonce” words. We find that learnt linear combinations are robust to NONCE, though individual best heads can be more sensitive.

pdf abs
Exploring Neural Entity Representations for Semantic Information
Andrew Runge | Eduard Hovy
Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Neural methods for embedding entities are typically extrinsically evaluated on downstream tasks and, more recently, intrinsically using probing tasks. Downstream task-based comparisons are often difficult to interpret due to differences in task structure, while probing task evaluations often look at only a few attributes and models. We address both of these issues by evaluating a diverse set of eight neural entity embedding methods on a set of simple probing tasks, demonstrating which methods are able to remember words used to describe entities, learn type, relationship and factual information, and identify how frequently an entity is mentioned. We also compare these methods in a unified framework on two entity linking tasks and discuss how they generalize to different model architectures and datasets.

pdf bib abs
Detecting Attackable Sentences in Arguments
Yohan Jo | Seojin Bang | Emaad Manzoor | Eduard Hovy | Chris Reed
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Finding attackable sentences in an argument is the first step toward successful refutation in argumentation. We present a first large-scale analysis of sentence attackability in online arguments. We analyze driving reasons for attacks in argumentation and identify relevant characteristics of sentences. We demonstrate that a sentence’s attackability is associated with many of these characteristics regarding the sentence’s content, proposition types, and tone, and that an external knowledge source can provide useful information about attackability. Building on these findings, we demonstrate that machine learning models can automatically detect attackable sentences in arguments, significantly better than several baselines and comparably well to laypeople.

pdf bib abs
Extracting Implicitly Asserted Propositions in Argumentation
Yohan Jo | Jacky Visser | Chris Reed | Eduard Hovy
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Argumentation accommodates various rhetorical devices, such as questions, reported speech, and imperatives. These rhetorical tools usually assert argumentatively relevant propositions rather implicitly, so understanding their true meaning is key to understanding certain arguments properly. However, most argument mining systems and computational linguistics research have paid little attention to implicitly asserted propositions in argumentation. In this paper, we examine a wide range of computational methods for extracting propositions that are implicitly asserted in questions, reported speech, and imperatives in argumentation. By evaluating the models on a corpus of 2016 U.S. presidential debates and online commentary, we demonstrate the effectiveness and limitations of the computational models. Our study may inform future research on argument mining and the semantics of these rhetorical devices in argumentation.

pdf abs
Incorporating a Local Translation Mechanism into Non-autoregressive Translation
Xiang Kong | Zhisong Zhang | Eduard Hovy
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this work, we introduce a novel local autoregressive translation (LAT) mechanism into non-autoregressive translation (NAT) models so as to capture local dependencies among target outputs. Specifically, for each target decoding position, instead of only one token, we predict a short sequence of tokens in an autoregressive way. We further design an efficient merging algorithm to align and merge the output pieces into one final output sequence. We integrate LAT into the conditional masked language model (CMLM) (Ghazvininejad et al.,2019) and similarly adopt iterative decoding. Empirical results on five translation tasks show that compared with CMLM, our method achieves comparable or better performance with fewer decoding iterations, bringing a 2.5x speedup. Further analysis indicates that our method reduces repeated translations and performs better at longer sentences. Our code will be released to the public.

We present the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. For example, in a text describing fog removal using potatoes, a car window may transition between being foggy, sticky, opaque, and clear. Previous formulations of this task provide the text and entities involved, and ask how those entities change for just a small, pre-defined set of attributes (e.g., location), limiting their fidelity. Our solution is a new task formulation where given just a procedural text as input, the task is to generate a set of state change tuples (entity, attribute, before-state, after-state) for each step, where the entity, attribute, and state values must be predicted from an open vocabulary. Using crowdsourcing, we create OPENPI, a high-quality (91.5% coverage as judged by humans and completely vetted), and large-scale dataset comprising 29,928 state changes over 4,050 sentences from 810 procedural real-world paragraphs from WikiHow.com. A current state-of-the-art generation model on this task achieves 16.1% F1 based on BLEU metric, leaving enough room for novel model architectures.

pdf abs
Plan ahead: Self-Supervised Text Planning for Paragraph Completion Task
Dongyeop Kang | Eduard Hovy
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Despite the recent success of contextualized language models on various NLP tasks, language model itself cannot capture textual coherence of a long, multi-sentence document (e.g., a paragraph). Humans often make structural decisions on what and how to say about before making utterances. Guiding surface realization with such high-level decisions and structuring text in a coherent way is essentially called a planning process. Where can the model learn such high-level coherence? A paragraph itself contains various forms of inductive coherence signals called self-supervision in this work, such as sentence orders, topical keywords, rhetorical structures, and so on. Motivated by that, this work proposes a new paragraph completion task PARCOM; predicting masked sentences in a paragraph. However, the task suffers from predicting and selecting appropriate topical content with respect to the given context. To address that, we propose a self-supervised text planner SSPlanner that predicts what to say first (content prediction), then guides the pretrained language model (surface realization) using the predicted content. SSPlanner outperforms the baseline generation models on the paragraph completion task in both automatic and human evaluation. We also find that a combination of noun and verb types of keywords is the most effective for content selection. As more number of content keywords are provided, overall generation quality also increases.

2019

pdf abs
Exploring Numeracy in Word Embeddings
Aakanksha Naik | Abhilasha Ravichander | Carolyn Rose | Eduard Hovy
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Word embeddings are now pervasive across NLP subfields as the de-facto method of forming text representataions. In this work, we show that existing embedding models are inadequate at constructing representations that capture salient aspects of mathematical meaning for numbers, which is important for language understanding. Numbers are ubiquitous and frequently appear in text. Inspired by cognitive studies on how humans perceive numbers, we develop an analysis framework to test how well word embeddings capture two essential properties of numbers: magnitude (e.g. 3<4) and numeration (e.g. 3=three). Our experiments reveal that most models capture an approximate notion of magnitude, but are inadequate at capturing numeration. We hope that our observations provide a starting point for the development of methods which better capture numeracy in NLP systems.

pdf abs
Toward Comprehensive Understanding of a Sentiment Based on Human Motives
Naoki Otani | Eduard Hovy
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In sentiment detection, the natural language processing community has focused on determining holders, facets, and valences, but has paid little attention to the reasons for sentiment decisions. Our work considers human motives as the driver for human sentiments and addresses the problem of motive detection as the first step. Following a study in psychology, we define six basic motives that cover a wide range of topics appearing in review texts, annotate 1,600 texts in restaurant and laptop domains with the motives, and report the performance of baseline methods on this new dataset. We also show that cross-domain transfer learning boosts detection performance, which indicates that these universal motives exist across different domains.

pdf abs
An Empirical Investigation of Structured Output Modeling for Graph-based Neural Dependency Parsing
Zhisong Zhang | Xuezhe Ma | Eduard Hovy
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this paper, we investigate the aspect of structured output modeling for the state-of-the-art graph-based neural dependency parser (Dozat and Manning, 2017). With evaluations on 14 treebanks, we empirically show that global output-structured models can generally obtain better performance, especially on the metric of sentence-level Complete Match. However, probably because neural models already learn good global views of the inputs, the improvement brought by structured output modeling is modest.

pdf abs
EQUATE: A Benchmark Evaluation Framework for Quantitative Reasoning in Natural Language Inference
Abhilasha Ravichander | Aakanksha Naik | Carolyn Rose | Eduard Hovy
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Quantitative reasoning is a higher-order reasoning skill that any intelligent natural language understanding system can reasonably be expected to handle. We present EQUATE (Evaluating Quantitative Understanding Aptitude in Textual Entailment), a new framework for quantitative reasoning in textual entailment. We benchmark the performance of 9 published NLI models on EQUATE, and find that on average, state-of-the-art methods do not achieve an absolute improvement over a majority-class baseline, suggesting that they do not implicitly learn to reason with quantities. We establish a new baseline Q-REAS that manipulates quantities symbolically. In comparison to the best performing NLI model, it achieves success on numerical reasoning tests (+24.2 %), but has limited verbal reasoning capabilities (-8.1 %). We hope our evaluation framework will support the development of models of quantitative reasoning in language understanding.

pdf abs
Word Embedding-Based Automatic MT Evaluation Metric using Word Position Information
Hiroshi Echizen’ya | Kenji Araki | Eduard Hovy
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We propose a new automatic evaluation metric for machine translation. Our proposed metric is obtained by adjusting the Earth Mover’s Distance (EMD) to the evaluation task. The EMD measure is used to obtain the distance between two probability distributions consisting of some signatures having a feature and a weight. We use word embeddings, sentence-level tf-idf, and cosine similarity between two word embeddings, respectively, as the features, weight, and the distance between two features. Results show that our proposed metric can evaluate machine translation based on word meaning. Moreover, for distance, cosine similarity and word position information are used to address word-order differences. We designate this metric as Word Embedding-Based automatic MT evaluation using Word Position Information (WE_WPI). A meta-evaluation using WMT16 metrics shared task set indicates that our WE_WPI achieves the highest correlation with human judgment among several representative metrics.

pdf abs
On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing
Wasi Ahmad | Zhisong Zhang | Xuezhe Ma | Eduard Hovy | Kai-Wei Chang | Nanyun Peng
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Different languages might have different word orders. In this paper, we investigate crosslingual transfer and posit that an orderagnostic model will perform better when transferring to distant foreign languages. To test our hypothesis, we train dependency parsers on an English corpus and evaluate their transfer performance on 30 other languages. Specifically, we compare encoders and decoders based on Recurrent Neural Networks (RNNs) and modified self-attentive architectures. The former relies on sequential information while the latter is more flexible at modeling word order. Rigorous experiments and detailed analysis shows that RNN-based architectures transfer well to languages that are close to English, while self-attentive models have better overall cross-lingual transferability and perform especially well on distant languages.

pdf abs
Iterative Search for Weakly Supervised Semantic Parsing
Pradeep Dasigi | Matt Gardner | Shikhar Murty | Luke Zettlemoyer | Eduard Hovy
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Training semantic parsers from question-answer pairs typically involves searching over an exponentially large space of logical forms, and an unguided search can easily be misled by spurious logical forms that coincidentally evaluate to the correct answer. We propose a novel iterative training algorithm that alternates between searching for consistent logical forms and maximizing the marginal likelihood of the retrieved ones. This training scheme lets us iteratively train models that provide guidance to subsequent ones to search for logical forms of increasing complexity, thus dealing with the problem of spuriousness. We evaluate these techniques on two hard datasets: WikiTableQuestions (WTQ) and Cornell Natural Language Visual Reasoning (NLVR), and show that our training algorithm outperforms the previous best systems, on WTQ in a comparable setting, and on NLVR with significantly less supervision.

pdf abs
Let’s Make Your Request More Persuasive: Modeling Persuasive Strategies via Semi-Supervised Neural Nets on Crowdfunding Platforms
Diyi Yang | Jiaao Chen | Zichao Yang | Dan Jurafsky | Eduard Hovy
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Modeling what makes a request persuasive - eliciting the desired response from a reader - is critical to the study of propaganda, behavioral economics, and advertising. Yet current models can’t quantify the persuasiveness of requests or extract successful persuasive strategies. Building on theories of persuasion, we propose a neural network to quantify persuasiveness and identify the persuasive strategies in advocacy requests. Our semi-supervised hierarchical neural network model is supervised by the number of people persuaded to take actions and partially supervised at the sentence level with human-labeled rhetorical strategies. Our method outperforms several baselines, uncovers persuasive strategies - offering increased interpretability of persuasive speech - and has applications for other situations with document-level supervision but only partial sentence supervision.

pdf abs
(Male, Bachelor) and (Female, Ph.D) have different connotations: Parallelly Annotated Stylistic Language Dataset with Multiple Personas
Dongyeop Kang | Varun Gangal | Eduard Hovy
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Stylistic variation in text needs to be studied with different aspects including the writer’s personal traits, interpersonal relations, rhetoric, and more. Despite recent attempts on computational modeling of the variation, the lack of parallel corpora of style language makes it difficult to systematically control the stylistic change as well as evaluate such models. We release PASTEL, the parallel and annotated stylistic language dataset, that contains ~41K parallel sentences (8.3K parallel stories) annotated across different personas. Each persona has different styles in conjunction: gender, age, country, political view, education, ethnic, and time-of-writing. The dataset is collected from human annotators with solid control of input denotation: not only preserving original meaning between text, but promoting stylistic diversity to annotators. We test the dataset on two interesting applications of style language, where PASTEL helps design appropriate experiment and evaluation. First, in predicting a target style (e.g., male or female in gender) given a text, multiple styles of PASTEL make other external style variables controlled (or fixed), which is a more accurate experimental design. Second, a simple supervised model with our parallel text outperforms the unsupervised models using nonparallel text in style transfer. Our dataset is publicly available.

pdf abs
Earlier Isn’t Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization
Taehee Jung | Dongyeop Kang | Lucas Mentch | Eduard Hovy
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Despite the recent developments on neural summarization systems, the underlying logic behind the improvements from the systems and its corpus-dependency remains largely unexplored. Position of sentences in the original text, for example, is a well known bias for news summarization. Following in the spirit of the claim that summarization is a combination of sub-functions, we define three sub-aspects of summarization: position, importance, and diversity and conduct an extensive analysis of the biases of each sub-aspect with respect to the domain of nine different summarization corpora (e.g., news, academic papers, meeting minutes, movie script, books, posts). We find that while position exhibits substantial bias in news articles, this is not the case, for example, with academic papers and meeting minutes. Furthermore, our empirical study shows that different types of summarization systems (e.g., neural-based) are composed of different degrees of the sub-aspects. Our study provides useful lessons regarding consideration of underlying sub-aspects when collecting a new summarization dataset or developing a new system.

pdf abs
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow
Xuezhe Ma | Chunting Zhou | Xian Li | Graham Neubig | Eduard Hovy
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Most sequence-to-sequence (seq2seq) models are autoregressive; they generate each token by conditioning on previously generated tokens. In contrast, non-autoregressive seq2seq models generate all tokens in one pass, which leads to increased efficiency through parallel processing on hardware such as GPUs. However, directly modeling the joint distribution of all tokens simultaneously is challenging, and even with increasingly complex model structures accuracy lags significantly behind autoregressive models. In this paper, we propose a simple, efficient, and effective model for non-autoregressive sequence generation using latent variable models. Specifically, we turn to generative flow, an elegant technique to model complex distributions using neural networks, and design several layers of flow tailored for modeling the conditional density of sequential latent variables. We evaluate this model on three neural machine translation (NMT) benchmark datasets, achieving comparable performance with state-of-the-art non-autoregressive NMT models and almost constant decoding time w.r.t the sequence length.

pdf abs
Linguistic Versus Latent Relations for Modeling Coherent Flow in Paragraphs
Dongyeop Kang | Eduard Hovy
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Generating a long, coherent text such as a paragraph requires a high-level control of different levels of relations between sentences (e.g., tense, coreference). We call such a logical connection between sentences as a (paragraph) flow. In order to produce a coherent flow of text, we explore two forms of intersentential relations in a paragraph: one is a human-created linguistical relation that forms a structure (e.g., discourse tree) and the other is a relation from latent representation learned from the sentences themselves. Our two proposed models incorporate each form of relations into document-level language models: the former is a supervised model that jointly learns a language model as well as discourse relation prediction, and the latter is an unsupervised model that is hierarchically conditioned by a recurrent neural network (RNN) over the latent information. Our proposed models with both forms of relations outperform the baselines in partially conditioned paragraph generation task. Our codes and data are publicly available.

pdf abs
Do Sentence Interactions Matter? Leveraging Sentence Level Representations for Fake News Classification
Vaibhav Vaibhav | Raghuram Mandyam | Eduard Hovy
Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13)

The rising growth of fake news and misleading information through online media outlets demands an automatic method for detecting such news articles. Of the few limited works which differentiate between trusted vs other types of news article (satire, propaganda, hoax), none of them model sentence interactions within a document. We observe an interesting pattern in the way sentences interact with each other across different kind of news articles. To capture this kind of information for long news articles, we propose a graph neural network-based model which does away with the need of feature engineering for fine grained fake news classification. Through experiments, we show that our proposed method beats strong neural baselines and achieves state-of-the-art accuracy on existing datasets. Moreover, we establish the generalizability of our model by evaluating its performance in out-of-domain scenarios. Code is available at https://github.com/MysteryVaibhav/fake_news_semantics.

To ensure readability, text is often written and presented with due formatting. These text formatting devices help the writer to effectively convey the narrative. At the same time, these help the readers pick up the structure of the discourse and comprehend the conveyed information. There have been a number of linguistic theories on discourse structure of text. However, these theories only consider unformatted text. Multimedia text contains rich formatting features that can be leveraged for various NLP tasks. In this article, we study some of these discourse features in multimedia text and what communicative function they fulfill in the context. As a case study, we use these features to harvest structured subject knowledge of geometry from textbooks. We conclude that the discourse and text layout features provide information that is complementary to lexical semantic information. Finally, we show that the harvested structured knowledge can be used to improve an existing solver for geometry problems, making it more accurate as well as more explainable.

pdf bib abs
A Cascade Model for Proposition Extraction in Argumentation
Yohan Jo | Jacky Visser | Chris Reed | Eduard Hovy
Proceedings of the 6th Workshop on Argument Mining

We present a model to tackle a fundamental but understudied problem in computational argumentation: proposition extraction. Propositions are the basic units of an argument and the primary building blocks of most argument mining systems. However, they are usually substituted by argumentative discourse units obtained via surface-level text segmentation, which may yield text segments that lack semantic information necessary for subsequent argument mining processes. In contrast, our cascade model aims to extract complete propositions by handling anaphora resolution, text segmentation, reported speech, questions, imperatives, missing subjects, and revision. We formulate each task as a computational problem and test various models using a corpus of the 2016 U.S. presidential debates. We show promising performance for some tasks and discuss main challenges in proposition extraction.

Domain adaptation remains one of the most challenging aspects in the wide-spread use of Semantic Role Labeling (SRL) systems. Current state-of-the-art methods are typically trained on large-scale datasets, but their performances do not directly transfer to low-resource domain-specific settings. In this paper, we propose two approaches for domain adaptation in the biological domain that involves pre-training LSTM-CRF based on existing large-scale datasets and adapting it for a low-resource corpus of biological processes. Our first approach defines a mapping between the source labels and the target labels, and the other approach modifies the final CRF layer in sequence-labeling neural network architecture. We perform our experiments on ProcessBank dataset which contains less than 200 paragraphs on biological processes. We improve over the previous state-of-the-art system on this dataset by 21 F1 points. We also show that, by incorporating event-event relationship in ProcessBank, we are able to achieve an additional 2.6 F1 gain, giving us possible insights into how to improve SRL systems for biological process using richer annotations.

2018

pdf abs
A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications
Dongyeop Kang | Waleed Ammar | Bhavana Dalvi | Madeleine van Zuylen | Sebastian Kohlmeier | Eduard Hovy | Roy Schwartz
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Peer reviewing is a central component in the scientific publishing process. We present the first public dataset of scientific peer reviews available for research purposes (PeerRead v1),1 providing an opportunity to study this important artifact. The dataset consists of 14.7K paper drafts and the corresponding accept/reject decisions in top-tier venues including ACL, NIPS and ICLR. The dataset also includes 10.7K textual peer reviews written by experts for a subset of the papers. We describe the data collection process and report interesting observed phenomena in the peer reviews. We also propose two novel NLP tasks based on this dataset and provide simple baseline models. In the first task, we show that simple models can predict whether a paper is accepted with up to 21% error reduction compared to the majority baseline. In the second task, we predict the numerical scores of review aspects and show that simple models can outperform the mean baseline for aspects with high variance such as ‘originality’ and ‘impact’.

We introduce a novel architecture for dependency parsing: stack-pointer networks (StackPtr). Combining pointer networks (Vinyals et al., 2015) with an internal stack, the proposed model first reads and encodes the whole sentence, then builds the dependency tree top-down (from root-to-leaf) in a depth-first fashion. The stack tracks the status of the depth-first search and the pointer networks select one child for the word at the top of the stack at each step. The StackPtr parser benefits from the information of whole sentence and all previously derived subtree structures, and removes the left-to-right restriction in classical transition-based parsers. Yet the number of steps for building any (non-projective) parse tree is linear in the length of the sentence just as other transition-based parsers, yielding an efficient decoding algorithm with O(n²) time complexity. We evaluate our model on 29 treebanks spanning 20 languages and different dependency annotation schemas, and achieve state-of-the-art performances on 21 of them

pdf abs
Learning to Generate Move-by-Move Commentary for Chess Games from Large-Scale Social Forum Data
Harsh Jhamtani | Varun Gangal | Eduard Hovy | Graham Neubig | Taylor Berg-Kirkpatrick
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper examines the problem of generating natural language descriptions of chess games. We introduce a new large-scale chess commentary dataset and propose methods to generate commentary for individual moves in a chess game. The introduced dataset consists of more than 298K chess move-commentary pairs across 11K chess games. We highlight how this task poses unique research challenges in natural language generation: the data contain a large variety of styles of commentary and frequently depend on pragmatic context. We benchmark various baselines and propose an end-to-end trainable neural model which takes into account multiple pragmatic aspects of the game state that may be commented upon to describe a given chess move. Through a human study on predictions for a subset of the data which deals with direct move descriptions, we observe that outputs from our models are rated similar to ground truth commentary texts in terms of correctness and fluency.

pdf abs
From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction
Zihang Dai | Qizhe Xie | Eduard Hovy
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this work, we study the credit assignment problem in reward augmented maximum likelihood (RAML) learning, and establish a theoretical equivalence between the token-level counterpart of RAML and the entropy regularized reinforcement learning. Inspired by the connection, we propose two sequence prediction algorithms, one extending RAML with fine-grained credit assignment and the other improving Actor-Critic with a systematic entropy regularization. On two benchmark datasets, we show the proposed algorithms outperform RAML and Actor-Critic respectively, providing new alternatives to sequence prediction.

pdf abs
AdvEntuRe: Adversarial Training for Textual Entailment with Knowledge-Guided Examples
Dongyeop Kang | Tushar Khot | Ashish Sabharwal | Eduard Hovy
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We consider the problem of learning textual entailment models with limited supervision (5K-10K training examples), and present two complementary approaches for it. First, we propose knowledge-guided adversarial example generators for incorporating large lexical resources in entailment models via only a handful of rule templates. Second, to make the entailment model—a discriminator—more robust, we propose the first GAN-style approach for training it using a natural language example generator that iteratively adjusts to the discriminator’s weaknesses. We demonstrate effectiveness using two entailment datasets, where the proposed methods increase accuracy by 4.7% on SciTail and by 2.8% on a 1% sub-sample of SNLI. Notably, even a single hand-written rule, negate, improves the accuracy of negation examples in SNLI by 6.1%.

The use of machine learning for NLP generally requires resources for training. Tasks performed in a low-resource language usually rely on labeled data in another, typically resource-rich, language. However, there might not be enough labeled data even in a resource-rich language such as English. In such cases, one approach is to use a hand-crafted approach that utilizes only a small bilingual dictionary with minimal manual verification to create distantly supervised data. Another is to explore typical machine learning techniques, for example adversarial training of bilingual word representations. We find that in event-type detection task—the task to classify [parts of] documents into a fixed set of labels—they give about the same performance. We explore ways in which the two methods can be complementary and also see how to best utilize a limited budget for manual annotation to maximize performance gain.

pdf abs
Graph Based Decoding for Event Sequencing and Coreference Resolution
Zhengzhong Liu | Teruko Mitamura | Eduard Hovy
Proceedings of the 27th International Conference on Computational Linguistics

Events in text documents are interrelated in complex ways. In this paper, we study two types of relation: Event Coreference and Event Sequencing. We show that the popular tree-like decoding structure for automated Event Coreference is not suitable for Event Sequencing. To this end, we propose a graph-based decoding algorithm that is applicable to both tasks. The new decoding algorithm supports flexible feature sets for both tasks. Empirically, our event coreference system has achieved state-of-the-art performance on the TAC-KBP 2015 event coreference task and our event sequencing system beats a strong temporal-based, oracle-informed baseline. We discuss the challenges of studying these event relations.

pdf bib
Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12)
Goran Glavaš | Swapna Somasundaran | Martin Riedl | Eduard Hovy
Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12)

pdf abs
Automatic Event Salience Identification
Zhengzhong Liu | Chenyan Xiong | Teruko Mitamura | Eduard Hovy
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Identifying the salience (i.e. importance) of discourse units is an important task in language understanding. While events play important roles in text documents, little research exists on analyzing their saliency status. This paper empirically studies Event Salience and proposes two salience detection models based on discourse relations. The first is a feature based salience model that incorporates cohesion among discourse units. The second is a neural model that captures more complex interactions between discourse units. In our new large-scale event salience corpus, both methods significantly outperform the strong frequency baseline, while our neural model further improves the feature based one by a large margin. Our analyses demonstrate that our neural model captures interesting connections between salience and discourse unit relations (e.g., scripts and frame structures).

pdf abs
Large-scale Cloze Test Dataset Created by Teachers
Qizhe Xie | Guokun Lai | Zihang Dai | Eduard Hovy
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Cloze tests are widely adopted in language exams to evaluate students’ language proficiency. In this paper, we propose the first large-scale human-created cloze test dataset CLOTH, containing questions used in middle-school and high-school language exams. With missing blanks carefully created by teachers and candidate choices purposely designed to be nuanced, CLOTH requires a deeper language understanding and a wider attention span than previously automatically-generated cloze datasets. We test the performance of dedicatedly designed baseline models including a language model trained on the One Billion Word Corpus and show humans outperform them by a significant margin. We investigate the source of the performance gap, trace model deficiencies to some distinct properties of CLOTH, and identify the limited ability of comprehending the long-term context to be the key bottleneck.

2017

pdf abs
Embedded Semantic Lexicon Induction with Joint Global and Local Optimization
Sujay Kumar Jauhar | Eduard Hovy
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

Creating annotated frame lexicons such as PropBank and FrameNet is expensive and labor intensive. We present a method to induce an embedded frame lexicon in an minimally supervised fashion using nothing more than unlabeled predicate-argument word pairs. We hypothesize that aggregating such pair selectional preferences across training leads us to a global understanding that captures predicate-argument frame structure. Our approach revolves around a novel integration between a predictive embedding model and an Indian Buffet Process posterior regularizer. We show, through our experimental evaluation, that we outperform baselines on two tasks and can learn an embedded frame lexicon that is able to capture some interesting generalities in relation to hand-crafted semantic frames.

pdf abs
Neural Probabilistic Model for Non-projective MST Parsing
Xuezhe Ma | Eduard Hovy
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this paper, we propose a probabilistic parsing model that defines a proper conditional probability distribution over non-projective dependency trees for a given sentence, using neural representations as inputs. The neural network architecture is based on bi-directional LSTMCNNs, which automatically benefits from both word- and character-level representations, by using a combination of bidirectional LSTMs and CNNs. On top of the neural network, we introduce a probabilistic structured layer, defining a conditional log-linear model over non-projective trees. By exploiting Kirchhoff’s Matrix-Tree Theorem (Tutte, 1984), the partition functions and marginals can be computed efficiently, leading to a straightforward end-to-end model training procedure via back-propagation. We evaluate our model on 17 different datasets, across 14 different languages. Our parser achieves state-of-the-art parsing performance on nine datasets.

pdf abs
STCP: Simplified-Traditional Chinese Conversion and Proofreading
Jiarui Xu | Xuezhe Ma | Chen-Tse Tsai | Eduard Hovy
Proceedings of the IJCNLP 2017, System Demonstrations

This paper aims to provide an effective tool for conversion between Simplified Chinese and Traditional Chinese. We present STCP, a customizable system comprising statistical conversion model, and proofreading web interface. Experiments show that our system achieves comparable character-level conversion performance with the state-of-art systems. In addition, our proofreading interface can effectively support diagnostics and data annotation. STCP is available at http://lagos.lti.cs.cmu.edu:8002/

pdf abs
RACE: Large-scale ReAding Comprehension Dataset From Examinations
Guokun Lai | Qizhe Xie | Hanxiao Liu | Yiming Yang | Eduard Hovy
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present RACE, a new dataset for benchmark evaluation of methods in the reading comprehension task. Collected from the English exams for middle and high school Chinese students in the age range between 12 to 18, RACE consists of near 28,000 passages and near 100,000 questions generated by human experts (English instructors), and covers a variety of topics which are carefully designed for evaluating the students’ ability in understanding and reasoning. In particular, the proportion of questions that requires reasoning is much larger in RACE than that in other benchmark datasets for reading comprehension, and there is a significant gap between the performance of the state-of-the-art models (43%) and the ceiling human performance (95%). We hope this new dataset can serve as a valuable resource for research and evaluation in machine comprehension. The dataset is freely available at http://www.cs.cmu.edu/~glai1/data/race/and the code is available at https://github.com/qizhex/RACE_AR_baselines.

pdf abs
Identifying Semantic Edit Intentions from Revisions in Wikipedia
Diyi Yang | Aaron Halfaker | Robert Kraut | Eduard Hovy
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Most studies on human editing focus merely on syntactic revision operations, failing to capture the intentions behind revision changes, which are essential for facilitating the single and collaborative writing process. In this work, we develop in collaboration with Wikipedia editors a 13-category taxonomy of the semantic intention behind edits in Wikipedia articles. Using labeled article edits, we build a computational classifier of intentions that achieved a micro-averaged F1 score of 0.621. We use this model to investigate edit intention effectiveness: how different types of edits predict the retention of newcomers and changes in the quality of articles, two key concerns for Wikipedia today. Our analysis shows that the types of edits that users make in their first session predict their subsequent survival as Wikipedia editors, and articles in different stages need different types of edits.

pdf abs
Detecting and Explaining Causes From Text For a Time Series Event
Dongyeop Kang | Varun Gangal | Ang Lu | Zheng Chen | Eduard Hovy
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Explaining underlying causes or effects about events is a challenging but valuable task. We define a novel problem of generating explanations of a time series event by (1) searching cause and effect relationships of the time series with textual data and (2) constructing a connecting chain between them to generate an explanation. To detect causal features from text, we propose a novel method based on the Granger causality of time series between features extracted from text such as N-grams, topics, sentiments, and their composition. The generation of the sequence of causal entities requires a commonsense causative knowledge base with efficient reasoning. To ensure good interpretability and appropriate lexical usage we combine symbolic and neural representations, using a neural reasoning algorithm trained on commonsense causal tuples to predict the next cause step. Our quantitative and human analysis show empirical evidence that our method successfully extracts meaningful causality relationships between time series with textual features and generates appropriate explanation between them.

pdf abs
Charmanteau: Character Embedding Models For Portmanteau Creation
Varun Gangal | Harsh Jhamtani | Graham Neubig | Eduard Hovy | Eric Nyberg
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Portmanteaus are a word formation phenomenon where two words combine into a new word. We propose character-level neural sequence-to-sequence (S2S) methods for the task of portmanteau generation that are end-to-end-trainable, language independent, and do not explicitly use additional phonetic information. We propose a noisy-channel-style model, which allows for the incorporation of unsupervised word lists, improving performance over a standard source-to-target model. This model is made possible by an exhaustive candidate generation strategy specifically enabled by the features of the portmanteau task. Experiments find our approach superior to a state-of-the-art FST-based baseline with respect to ground truth accuracy and human evaluation.

pdf bib
Proceedings of TextGraphs-11: the Workshop on Graph-based Methods for Natural Language Processing
Martin Riedl | Swapna Somasundaran | Goran Glavaš | Eduard Hovy
Proceedings of TextGraphs-11: the Workshop on Graph-based Methods for Natural Language Processing

pdf abs
Event Detection Using Frame-Semantic Parser
Evangelia Spiliopoulou | Eduard Hovy | Teruko Mitamura
Proceedings of the Events and Stories in the News Workshop

Recent methods for Event Detection focus on Deep Learning for automatic feature generation and feature ranking. However, most of those approaches fail to exploit rich semantic information, which results in relatively poor recall. This paper is a small & focused contribution, where we introduce an Event Detection and classification system, based on deep semantic information retrieved from a frame-semantic parser. Our experiments show that our system achieves higher recall than state-of-the-art systems. Further, we claim that enhancing our system with deep learning techniques like feature ranking can achieve even better results, as it can benefit from both approaches.

pdf abs
Huntsville, hospitals, and hockey teams: Names can reveal your location
Bahar Salehi | Dirk Hovy | Eduard Hovy | Anders Søgaard
Proceedings of the 3rd Workshop on Noisy User-generated Text

Geolocation is the task of identifying a social media user’s primary location, and in natural language processing, there is a growing literature on to what extent automated analysis of social media posts can help. However, not all content features are equally revealing of a user’s location. In this paper, we evaluate nine name entity (NE) types. Using various metrics, we find that GEO-LOC, FACILITY and SPORT-TEAM are more informative for geolocation than other NE types. Using these types, we improve geolocation accuracy and reduce distance error over various famous text-based methods.

pdf bib abs
Shakespearizing Modern Language Using Copy-Enriched Sequence to Sequence Models
Harsh Jhamtani | Varun Gangal | Eduard Hovy | Eric Nyberg
Proceedings of the Workshop on Stylistic Variation

Variations in writing styles are commonly used to adapt the content to a specific context, audience, or purpose. However, applying stylistic variations is still by and large a manual process, and there have been little efforts towards automating it. In this paper we explore automated methods to transform text from modern English to Shakespearean English using an end to end trainable neural model with pointers to enable copy action. To tackle limited amount of parallel data, we pre-train embeddings of words by leveraging external dictionaries mapping Shakespearean words to modern English words as well as additional text. Our methods are able to get a BLEU score of 31+, an improvement of ≈ 6 points above the strongest baseline. We publicly release our code to foster further research in this area.

pdf abs
Finding Structure in Figurative Language: Metaphor Detection with Topic-based Frames
Hyeju Jang | Keith Maki | Eduard Hovy | Carolyn Rosé
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

In this paper, we present a novel and highly effective method for induction and application of metaphor frame templates as a step toward detecting metaphor in extended discourse. We infer implicit facets of a given metaphor frame using a semi-supervised bootstrapping approach on an unlabeled corpus. Our model applies this frame facet information to metaphor detection, and achieves the state-of-the-art performance on a social media dataset when building upon other proven features in a nonlinear machine learning model. In addition, we illustrate the mechanism through which the frame and topic information enable the more accurate metaphor detection.

pdf abs
An Interpretable Knowledge Transfer Model for Knowledge Base Completion
Qizhe Xie | Xuezhe Ma | Zihang Dai | Eduard Hovy
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Knowledge bases are important resources for a variety of natural language processing tasks but suffer from incompleteness. We propose a novel embedding model, ITransF, to perform knowledge base completion. Equipped with a sparse attention mechanism, ITransF discovers hidden concepts of relations and transfer statistical strength through the sharing of concepts. Moreover, the learned associations between relations and concepts, which are represented by sparse attention vectors, can be interpreted easily. We evaluate ITransF on two benchmark datasets—WN18 and FB15k for knowledge base completion and obtains improvements on both the mean rank and Hits@10 metrics, over all baselines that do not use additional information.

pdf abs
Ontology-Aware Token Embeddings for Prepositional Phrase Attachment
Pradeep Dasigi | Waleed Ammar | Chris Dyer | Eduard Hovy
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Type-level word embeddings use the same set of parameters to represent all instances of a word regardless of its context, ignoring the inherent lexical ambiguity in language. Instead, we embed semantic concepts (or synsets) as defined in WordNet and represent a word token in a particular context by estimating a distribution over relevant semantic concepts. We use the new, context-sensitive embeddings in a model for predicting prepositional phrase (PP) attachments and jointly learn the concept embeddings and model parameters. We show that using context-sensitive embeddings improves the accuracy of the PP attachment model by 5.4% absolute points, which amounts to a 34.4% relative reduction in errors.

2016

pdf
Visualizing and Understanding Neural Models in NLP
Jiwei Li | Xinlei Chen | Eduard Hovy | Dan Jurafsky
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Unsupervised Ranking Model for Entity Coreference Resolution
Xuezhe Ma | Zhengzhong Liu | Eduard Hovy
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Hierarchical Attention Networks for Document Classification
Zichao Yang | Diyi Yang | Chris Dyer | Xiaodong He | Alex Smola | Eduard Hovy
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Proceedings of the Fourth Workshop on Events
Martha Palmer | Ed Hovy | Teruko Mitamura | Tim O’Gorman
Proceedings of the Fourth Workshop on Events

pdf
Unsupervised Event Coreference for Abstract Words
Dheeraj Rajagopal | Eduard Hovy | Teruko Mitamura
Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods

pdf abs
Edit Categories and Editor Role Identification in Wikipedia
Diyi Yang | Aaron Halfaker | Robert Kraut | Eduard Hovy
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this work, we introduced a corpus for categorizing edit types in Wikipedia. This fine-grained taxonomy of edit types enables us to differentiate editing actions and find editor roles in Wikipedia based on their low-level edit types. To do this, we first created an annotated corpus based on 1,996 edits obtained from 953 article revisions and built machine-learning models to automatically identify the edit categories associated with edits. Building on this automated measurement of edit types, we then applied a graphical model analogous to Latent Dirichlet Allocation to uncover the latent roles in editors’ edit histories. Applying this technique revealed eight different roles editors play, such as Social Networker, Substantive Expert, etc.

pdf
Tables as Semi-structured Knowledge for Question Answering
Sujay Kumar Jauhar | Peter Turney | Eduard Hovy
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF
Xuezhe Ma | Eduard Hovy
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Harnessing Deep Neural Networks with Logic Rules
Zhiting Hu | Xuezhe Ma | Zhengzhong Liu | Eduard Hovy | Eric Xing
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf
Ontologically Grounded Multi-sense Representation Learning for Semantic Vector Space Models
Sujay Kumar Jauhar | Chris Dyer | Eduard Hovy
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Retrofitting Word Vectors to Semantic Lexicons
Manaal Faruqui | Jesse Dodge | Sujay Kumar Jauhar | Chris Dyer | Eduard Hovy | Noah A. Smith
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Efficient Inner-to-outer Greedy Algorithm for Higher-order Labeled Dependency Parsing
Xuezhe Ma | Eduard Hovy
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf
When Are Tree Structures Necessary for Deep Learning of Representations?
Jiwei Li | Thang Luong | Dan Jurafsky | Eduard Hovy
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf
Humor Recognition and Humor Anchor Extraction
Diyi Yang | Alon Lavie | Chris Dyer | Eduard Hovy
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation
Eduard Hovy | Teruko Mitamura | Martha Palmer
Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation

pdf bib
Word Sense Disambiguation via PropStore and OntoNotes for Event Mention Detection
Nicolas R. Fauceglia | Yiu-Chang Lin | Xuezhe Ma | Eduard Hovy
Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation

pdf
Evaluation Algorithms for Event Nugget Detection : A Pilot Study
Zhengzhong Liu | Teruko Mitamura | Eduard Hovy
Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation

2014

pdf abs
A Corpus of Participant Roles in Contentious Discussions
Siddharth Jain | Archna Bhatia | Angelique Rein | Eduard Hovy
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The expansion of social roles is, nowadays, a fact due to the ability of users to interact, discuss, exchange ideas and opinions, and form social networks though social media. Users in online social environment play a variety of social roles. The concept of “social role” has long been used in social science describe the intersection of behavioural, meaningful, and structural attributes that emerge regularly in particular settings. In this paper, we present a new corpus for social roles in online contentious discussions. We explore various behavioural attributes such as stubbornness, sensibility, influence, and ignorance to create a model of social roles to distinguish among various social roles participants assume in such setup. We annotate discussions drawn from two different sets of corpora in order to ensure that our model of social roles and their signals hold up in general. We discuss the various criteria for deciding values for each behavioural attributes which define the roles.

pdf abs
Supervised Within-Document Event Coreference using Information Propagation
Zhengzhong Liu | Jun Araki | Eduard Hovy | Teruko Mitamura
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Event coreference is an important task for full text analysis. However, previous work uses a variety of approaches, sources and evaluation, making the literature confusing and the results incommensurate. We provide a description of the differences to facilitate future research. Second, we present a supervised method for event coreference resolution that uses a rich feature set and propagates information alternatively between events and their arguments, adapting appropriately for each type of argument.

pdf abs
Detecting Subevent Structure for Event Coreference Resolution
Jun Araki | Zhengzhong Liu | Eduard Hovy | Teruko Mitamura
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In the task of event coreference resolution, recent work has shown the need to perform not only full coreference but also partial coreference of events. We show that subevents can form a particular hierarchical event structure. This paper examines a novel two-stage approach to finding and improving subevent structures. First, we introduce a multiclass logistic regression model that can detect subevent relations in addition to full coreference. Second, we propose a method to improve subevent structure based on subevent clusters detected by the model. Using a corpus in the Intelligence Community domain, we show that the method achieves over 3.2 BLANC F1 gain in detecting subevent relations against the logistic regression model.

pdf
Metaphor Detection through Term Relevance
Marc Schulder | Eduard Hovy
Proceedings of the Second Workshop on Metaphor in NLP

pdf bib
Proceedings of the Second Workshop on EVENTS: Definition, Detection, Coreference, and Representation
Teruko Mitamura | Eduard Hovy | Martha Palmer
Proceedings of the Second Workshop on EVENTS: Definition, Detection, Coreference, and Representation

pdf
Evaluation for Partial Event Coreference
Jun Araki | Eduard Hovy | Teruko Mitamura
Proceedings of the Second Workshop on EVENTS: Definition, Detection, Coreference, and Representation

pdf
Application of Prize based on Sentence Length in Chunk-based Automatic Evaluation of Machine Translation
Hiroshi Echizen’ya | Kenji Araki | Eduard Hovy
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf
Sentiment Analysis on the People’s Daily
Jiwei Li | Eduard Hovy
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf
Major Life Event Extraction from Twitter based on Congratulations/Condolences Speech Acts
Jiwei Li | Alan Ritter | Claire Cardie | Eduard Hovy
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf
A Model of Coherence Based on Distributed Sentence Representation
Jiwei Li | Eduard Hovy
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf
Recursive Deep Models for Discourse Parsing
Jiwei Li | Rumeng Li | Eduard Hovy
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf
Weakly Supervised User Profile Extraction from Twitter
Jiwei Li | Alan Ritter | Eduard Hovy
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Vector space semantics with frequency-driven motifs
Shashank Srivastava | Eduard Hovy
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Towards a General Rule for Identifying Deceptive Opinion Spam
Jiwei Li | Myle Ott | Claire Cardie | Eduard Hovy
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
An Extension of BLANC to System Mentions
Xiaoqiang Luo | Sameer Pradhan | Marta Recasens | Eduard Hovy
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
Scoring Coreference Partitions of Predicted Mentions: A Reference Implementation
Sameer Pradhan | Xiaoqiang Luo | Marta Recasens | Eduard Hovy | Vincent Ng | Michael Strube
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
Inducing Latent Semantic Relations for Structured Distributional Semantics
Sujay Kumar Jauhar | Eduard Hovy
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf
Unsupervised Word Sense Induction using Distributional Statistics
Kartik Goyal | Eduard Hovy
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf
Modeling Newswire Events using Neural Networks for Anomaly Detection
Pradeep Dasigi | Eduard Hovy
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf
A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations
Shashank Srivastava | Dirk Hovy | Eduard Hovy
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf
Automatic Interpretation of the English Possessive
Stephen Tratz | Eduard Hovy
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
A Structured Distributional Semantic Model for Event Co-reference
Kartik Goyal | Sujay Kumar Jauhar | Huiying Li | Mrinmaya Sachan | Shashank Srivastava | Eduard Hovy
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Squibs: What Is a Paraphrase?
Rahul Bhagat | Eduard Hovy
Computational Linguistics, Volume 39, Issue 3 - September 2013

pdf
Learning Whom to Trust with MACE
Dirk Hovy | Taylor Berg-Kirkpatrick | Ashish Vaswani | Eduard Hovy
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Workshop on Events: Definition, Detection, Coreference, and Representation
Eduard Hovy | Teruko Mitamura | Martha Palmer
Workshop on Events: Definition, Detection, Coreference, and Representation

pdf
Events are Not Simple: Identity, Non-Identity, and Quasi-Identity
Eduard Hovy | Teruko Mitamura | Felisa Verdejo | Jun Araki | Andrew Philpot
Workshop on Events: Definition, Detection, Coreference, and Representation

pdf
A Structured Distributional Semantic Model : Integrating Structure with Semantics
Kartik Goyal | Sujay Kumar Jauhar | Huiying Li | Mrinmaya Sachan | Shashank Srivastava | Eduard Hovy
Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality

pdf
Automatic Evaluation Metric for Machine Translation that is Independent of Sentence Length
Hiroshi Echizen’ya | Kenji Araki | Eduard Hovy
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

2012

pdf
Structured Event Retrieval over Microblog Archives
Donald Metzler | Congxing Cai | Eduard Hovy
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Exploiting Partial Annotations with EM Training
Dirk Hovy | Eduard Hovy
Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure

pdf bib
Optimization for Efficient Determination of Chunk in Automatic Evaluation for Machine Translation
Hiroshi Echizen’ya | Kenji Araki | Eduard Hovy
Proceedings of the First International Workshop on Optimization Techniques for Human Language Technology

This paper describes a methodology for testing and evaluating the performance of Machine Reading systems through Question Answering and Reading Comprehension Tests. The methodology is being used in QA4MRE (QA for Machine Reading Evaluation), one of the labs of CLEF. The task was to answer a series of multiple choice tests, each based on a single document. This allows complex questions to be asked but makes evaluation simple and completely automatic. The evaluation architecture is completely multilingual: test documents, questions, and their answers are identical in all the supported languages. Background text collections are comparable collections harvested from the web for a set of predefined topics. Each test received an evaluation score between 0 and 1 using c@1. This measure encourages systems to reduce the number of incorrect answers while maintaining the number of correct ones by leaving some questions unanswered. 12 groups participated in the task, submitting 62 runs in 3 different languages (German, English, and Romanian). All runs were monolingual; no team attempted a cross-language task. We report here the conclusions and lessons learned after the first campaign in 2011.

2011

pdf
Unsupervised Discovery of Domain-Specific Knowledge from Text
Dirk Hovy | Chunliang Zhang | Eduard Hovy | Anselmo Peñas
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf
Insights from Network Structure for Text Mining
Zornitsa Kozareva | Eduard Hovy
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf
Models and Training for Unsupervised Preposition Sense Disambiguation
Dirk Hovy | Ashish Vaswani | Stephen Tratz | David Chiang | Eduard Hovy
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf
An Empirical Evaluation of Data-Driven Paraphrase Generation Techniques
Donald Metzler | Eduard Hovy | Chunliang Zhang
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
A New Semantics: Merging Propositional and Distributional Information
Eduard Hovy
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

pdf
Granularity in Natural Language Discourse
Rutu Mulkar-Mehta | Jerry Hobbs | Eduard Hovy
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

pdf
The Role of Information Extraction in the Design of a Document Triage Application for Biocuration
Sandeep Pokkunuri | Cartic Ramakrishnan | Ellen Riloff | Eduard Hovy | Gully Burns
Proceedings of BioNLP 2011 Workshop

pdf
Contextual Bearing on Linguistic Variation in Social Media
Stephan Gouws | Donald Metzler | Congxing Cai | Eduard Hovy
Proceedings of the Workshop on Language in Social Media (LSM 2011)

pdf bib
Invited Keynote: What are Subjectivity, Sentiment, and Affect?
Eduard Hovy
Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2011)

pdf
A Fast, Accurate, Non-Projective, Semantically-Enriched Parser
Stephen Tratz | Eduard Hovy
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf
A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation
Stephen Tratz | Eduard Hovy
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf
Coreference Resolution across Corpora: Languages, Coding Schemes, and Preprocessing Information
Marta Recasens | Eduard Hovy
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf
Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
Zornitsa Kozareva | Eduard Hovy
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf
Annotation
Eduard Hovy
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

pdf
A Semi-Supervised Method to Learn and Construct Taxonomies Using the Web
Zornitsa Kozareva | Eduard Hovy
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf
Not All Seeds Are Equal: Measuring the Quality of Text Mining Seeds
Zornitsa Kozareva | Eduard Hovy
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Summarizing Textual Information about Locations In a Geo-Spatial Information Display System
Congxing Cai | Eduard Hovy
Proceedings of the NAACL HLT 2010 Demonstration Session

pdf
ISI: Automatic Classification of Relations Between Nominals Using a Maximum Entropy Classifier
Stephen Tratz | Eduard Hovy
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Rutu Mulkar-Mehta | James Allen | Jerry Hobbs | Eduard Hovy | Bernardo Magnini | Chris Manning
Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading

pdf
Semantic Enrichment of Text with Background Knowledge
Anselmo Peñas | Eduard Hovy
Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading

pdf bib
Proceedings of the NAACL HLT 2010 Workshop on Semantic Search
Donghui Feng | Jamie Callan | Eduard Hovy | Marius Pasca
Proceedings of the NAACL HLT 2010 Workshop on Semantic Search

pdf
Injecting Linguistics into NLP through Annotation
Eduard Hovy
Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground

pdf
Negation and modality in distributional semantics
Ed Hovy
Proceedings of the Workshop on Negation and Speculation in Natural Language Processing

pdf bib
Distributional Semantics and the Lexicon
Eduard Hovy
Proceedings of the 2nd Workshop on Cognitive Aspects of the Lexicon

pdf
What’s in a Preposition? Dimensions of Sense Disambiguation for an Interesting Word Class
Dirk Hovy | Stephen Tratz | Eduard Hovy
Coling 2010: Posters

pdf
Filling Knowledge Gaps in Text for Machine Reading
Anselmo Peñas | Eduard Hovy
Coling 2010: Posters

pdf abs
A Typology of Near-Identity Relations for Coreference (NIDENT)
Marta Recasens | Eduard Hovy | M. Antònia Martí
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The task of coreference resolution requires people or systems to decide when two referring expressions refer to the 'same' entity or event. In real text, this is often a difficult decision because identity is never adequately defined, leading to contradictory treatment of cases in previous work. This paper introduces the concept of 'near-identity', a middle ground category between identity and non-identity, to handle such cases systematically. We present a typology of Near-Identity Relations (NIDENT) that includes fifteen types―grouped under four main families―that capture a wide range of ways in which (near-)coreference relations hold between discourse entities. We validate the theoretical model by annotating a small sample of real data and showing that inter-annotator agreement is high enough for stability (K=0.58, and up to K=0.65 and K=0.84 when leaving out one and two outliers, respectively). This work enables subsequent creation of the first internally consistent language resource of this type through larger annotation efforts.

2009

pdf
Toward Completeness in Concept Extraction and Classification
Eduard Hovy | Zornitsa Kozareva | Ellen Riloff
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf
Adaptive Information Extraction for Complex Biomedical Tasks
Donghui Feng | Gully Burns | Eduard Hovy
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

pdf bib
Corpus Cleanup of Mistaken Agreement Using Word Sense Disambiguation
Liang-Chih Yu | Chung-Hsien Wu | Jui-Feng Yeh | Eduard Hovy
International Journal of Computational Linguistics & Chinese Language Processing, Volume 13, Number 4, December 2008

pdf
Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs
Zornitsa Kozareva | Ellen Riloff | Eduard Hovy
Proceedings of ACL-08: HLT

pdf
Learning a Stopping Criterion for Active Learning for Word Sense Disambiguation and Text Classification
Jingbo Zhu | Huizhen Wang | Eduard Hovy
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf
Towards Automated Semantic Analysis on Biomedical Research Articles
Donghui Feng | Gully Burns | Jingbo Zhu | Eduard Hovy
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf abs
A Common Ground for Virtual Humans: Using an Ontology in a Natural Language Oriented Virtual Human Architecture
Arno Hartholt | Thomas Russ | David Traum | Eduard Hovy | Susan Robinson
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

When dealing with large, distributed systems that use state-of-the-art components, individual components are usually developed in parallel. As development continues, the decoupling invariably leads to a mismatch between how these components internally represent concepts and how they communicate these representations to other components: representations can get out of synch, contain localized errors, or become manageable only by a small group of experts for each module. In this paper, we describe the use of an ontology as part of a complex distributed virtual human architecture in order to enable better communication between modules while improving the overall flexibility needed to change or extend the system. We focus on the natural language understanding capabilities of this architecture and the relationship between language and concepts within the entire system in general and the ontology in particular.

pdf
OntoNotes: Corpus Cleanup of Mistaken Agreement Using Word Sense Disambiguation
Liang-Chih Yu | Chung-Hsien Wu | Eduard Hovy
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf
Multi-Criteria-Based Strategy to Stop Active Learning for Data Annotation
Jingbo Zhu | Huizhen Wang | Eduard Hovy
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf
ISP: Learning Inferential Selectional Preferences
Patrick Pantel | Rahul Bhagat | Bonaventura Coppola | Timothy Chklovski | Eduard Hovy
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf
A Semi-Automatic Evaluation Scheme: Automated Nuggetization for Manual Annotation
Liang Zhou | Namhee Kwon | Eduard Hovy
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

pdf
Topic Analysis for Psychiatric Document Retrieval
Liang-Chih Yu | Chung-Hsien Wu | Chin-Yew Lin | Eduard Hovy | Chia-Ling Lin
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

Investigating why BLEU penalizes non-statistical systems
Eduard Hovy
Proceedings of the Workshop on Automatic procedures in MT evaluation

pdf
LEDIR: An Unsupervised Algorithm for Learning Directionality of Inference Rules
Rahul Bhagat | Patrick Pantel | Eduard Hovy
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf
Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem
Jingbo Zhu | Eduard Hovy
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf
Extracting Data Records from Unstructured Biomedical Full Text
Donghui Feng | Gully Burns | Eduard Hovy
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf
Crystal: Analyzing Predictive Opinions on the Web
Soo-Min Kim | Eduard Hovy
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf abs
Automated Summarization Evaluation with Basic Elements.
Eduard Hovy | Chin-Yew Lin | Liang Zhou | Junichi Fukumoto
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

As part of evaluating a summary automati-cally, it is usual to determine how much of the contents of one or more human-produced ideal summaries it contains. Past automated methods such as ROUGE compare using fixed word ngrams, which are not ideal for a variety of reasons. In this paper we describe a framework in which summary evaluation measures can be instantiated and compared, and we implement a specific evaluation method using very small units of content, called Basic Elements that address some of the shortcomings of ngrams. This method is tested on DUC 2003, 2004, and 2005 systems and produces very good correlations with human judgments.

pdf abs
Summarizing Answers for Complicated Questions
Liang Zhou | Chin-Yew Lin | Eduard Hovy
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Recent work in several computational linguistics (CL) applications (especially question answering) has shown the value of semantics (in fact, many people argue that the current performance ceiling experienced by so many CL applications derives from their inability to perform any kind of semantic processing). But the absence of a large semantic information repository that provides representations for sentences prevents the training of statistical CL engines and thus hampers the development of such semantics-enabled applications. This talk refers to recent work in several projects that seek to annotate large volumes of text with shallower or deeper representations of some semantic phenomena. It describes one of the essential problemscreating, managing, and annotating (at large scale) the meanings of words, and outlines the Omega ontology, being built at ISI, that acts as term repository. The talk illustrates how one can proceed from words via senses to concepts, and how the annotation process can help verify good concept decisions and expose bad ones. Much of this work is performed in the context of the OntoNotes project, joint with BBN, the Universities of Colorado and Pennsylvania, and ISI, that is working to build a corpus of about 1M words (English, Chinese, and Arabic), annotated for shallow semantics, over the next few years.

This paper describes an effort to investigate the incrementally deepening development of an interlingua notation, validated by human annotation of texts in English plus six languages. We begin with deep syntactic annotation, and in this paper present a series of annotation manuals for six different languages at the deep-syntactic level of representation. Many syntactic differences between languages are removed in the proposed syntactic annotation, making them useful resources for multilingual NLP projects with semantic components.

pdf
Automatic Identification of Pro and Con Reasons in Online Reviews
Soo-Min Kim | Eduard Hovy
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
A Gentle Introduction to Ontologies
Eduard Hovy
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Tutorials

pdf
Identifying and Analyzing Judgment Opinions
Soo-Min Kim | Eduard Hovy
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf
Learning to Detect Conversation Focus of Threaded Discussions
Donghui Feng | Erin Shaw | Jihie Kim | Eduard Hovy
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf
ParaEval: Using Paraphrases to Evaluate Summaries Automatically
Liang Zhou | Chin-Yew Lin | Dragos Stefan Munteanu | Eduard Hovy
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf
OntoNotes: The 90% Solution
Eduard Hovy | Mitchell Marcus | Martha Palmer | Lance Ramshaw | Ralph Weischedel
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text
Soo-Min Kim | Eduard Hovy
Proceedings of the Workshop on Sentiment and Subjectivity in Text

pdf
Re-evaluating Machine Translation Results with Paraphrase Support
Liang Zhou | Chin-Yew Lin | Eduard Hovy
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the Analyzing Conversations in Text and Speech
Eduard Hovy | Klaus Zechner | Liang Zhou
Proceedings of the Analyzing Conversations in Text and Speech

2005

pdf
Automatic Detection of Opinion Bearing Words and Sentences
Soo-Min Kim | Eduard Hovy
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

pdf
The Omega Ontology
Andrew Philpot | Eduard Hovy | Patrick Pantel
Proceedings of OntoLex 2005 - Ontologies and Lexical Resources

pdf
Statistical Shallow Semantic Parsing despite Little Training Data
Rahul Bhagat | Anton Leuski | Eduard Hovy
Proceedings of the Ninth International Workshop on Parsing Technology

pdf
Handling Biographical Questions with Implicature
Donghui Feng | Eduard Hovy
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf
Classummary: Introducing Discussion Summarization to Online Classrooms
Liang Zhou | Erin Shaw | Chin-Yew Lin | Eduard Hovy
Proceedings of HLT/EMNLP 2005 Interactive Demonstrations

pdf
Digesting Virtual “Geek” Culture: The Summarization of Technical Internet Relay Chats
Liang Zhou | Eduard Hovy
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

pdf
Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering
Deepak Ravichandran | Patrick Pantel | Eduard Hovy
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
Multi-Document Person Name Resolution
Michael Fleischman | Eduard Hovy
Proceedings of the Conference on Reference Resolution and Its Applications

pdf
Senseval automatic labeling of semantic roles using Maximum Entropy models
Namhee Kwon | Michael Fleischman | Eduard Hovy
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf
Template-Filtered Headline Summarization
Liang Zhou | Eduard Hovy
Text Summarization Branches Out

pdf
Multi-Document Biography Summarization
Liang Zhou | Miruna Ticrea | Eduard Hovy
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

pdf
Towards Terascale Semantic Acquisition
Patrick Pantel | Deepak Ravichandran | Eduard Hovy
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf
FrameNet-based Semantic Parsing using Maximum Entropy Models
Namhee Kwon | Michael Fleischman | Eduard Hovy
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf
Determining the Sentiment of Opinions
Soo-Min Kim | Eduard Hovy
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

MT systems that use only superficial representations, including the current generation of statistical MT systems, have been successful and useful. However, they will experience a plateau in quality, much like other “silver bullet” approaches to MT. We pursue work on the development of interlingual representations for use in symbolic or hybrid MT systems. In this paper, we describe the creation of an interlingua and the development of a corpus of semantically annotated text, to be validated in six languages and evaluated in several ways. We have established a distributed, well-functioning research methodology, designed a preliminary interlingua notation, created annotation manuals and tools, developed a test collection in six languages with associated English translations, annotated some 150 translations, and designed and applied various annotation metrics. We describe the data sets being annotated and the interlingual (IL) representation language which uses two ontologies and a systematic theta-role list. We present the annotation tools built and outline the annotation process. Following this, we describe our evaluation methodology and conclude with a summary of issues that have arisen.

2003

pdf
Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics
Chin-Yew Lin | Eduard Hovy
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

pdf
A Web-Trained Extraction Summarization System
Liang Zhou | Eduard Hovy
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

pdf
A Maximum Entropy Approach to FrameNet Tagging
Michael Fleischman | Eduard Hovy
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

pdf bib
Offline Strategies for Online Question Answering: Answering Questions Before They Are Asked
Michael Fleischman | Eduard Hovy | Abdessamad Echihabi
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf
iNeATS: Interactive Multi-Document Summarization
Anton Leuski | Chin-Yew Lin | Eduard Hovy
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics

pdf
The Potential and Limitations of Automatic Sentence Extraction for Summarization
Chin-Yew Lin | Eduard Hovy
Proceedings of the HLT-NAACL 03 Text Summarization Workshop

pdf
Maximum Entropy Models for FrameNet Classification
Michael Fleischman | Namhee Kwon | Eduard Hovy
Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing

pdf
Statistical QA - Classifier vs. Re-ranker: What’s the difference?
Deepak Ravichandran | Eduard Hovy | Franz Josef Och
Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering

Holy and unholy grails
Eduard Hovy | Deepak Ravichandran
Proceedings of Machine Translation Summit IX: Plenaries

pdf abs
BTL: a hybrid model for English-Vietnamese machine translation
Dinh Dien | Kiem Hoang | Eduard Hovy
Proceedings of Machine Translation Summit IX: Papers

Machine Translation (MT) is the most interesting and difficult task which has been posed since the beginning of computer history. The highest difficulty which computers had to face with, is the built-in ambiguity of Natural Languages. Formerly, a lot of human-devised rules have been used to disambiguate those ambiguities. Building such a complete rule-set is time-consuming and labor-intensive task whilst it doesn’t cover all the cases. Besides, when the scale of system increases, it is very difficult to control that rule-set. In this paper, we present a new model of learning-based MT (entitled BTL: Bitext-Transfer Learning) that learns from bilingual corpus to extract disambiguating rules. This model has been experimented in English-to-Vietnamese MT system (EVT) and it gave encouraging results.

pdf abs
FEMTI: creating and using a framework for MT evaluation
Margaret King | Andrei Popescu-Belis | Eduard Hovy
Proceedings of Machine Translation Summit IX: Papers

This paper presents FEMTI, a web-based Framework for the Evaluation of Machine Translation in ISLE. FEMTI offers structured descriptions of potential user needs, linked to an overview of technical characteristics of MT systems. The description of possible systems is mainly articulated around the quality characteristics for software product set out in ISO/IEC standard 9126. Following the philosophy set out there and in the related 14598 series of standards, each quality characteristic bottoms out in metrics which may be applied to a particular instance of a system in order to judge how satisfactory the system is with respect to that characteristic. An evaluator can use the description of user needs to help identify the specific needs of his evaluation and the relations between them. He can then follow the pointers to system description to determine what metrics should be applied and how. In the current state of the framework, emphasis is on being exhaustive, including as much as possible of the information available in the literature on machine translation evaluation. Future work will aim at being more analytic, looking at characteristics and metrics to see how they relate to one another, validating metrics and investigating the correlation between particular metrics and human judgement.

This panel deals with the general topic of evaluation of machine translation systems. The first contribution sets out some recent work on creating standards for the design of evaluations. The second, by Eduard Hovy. takes up the particular issue of how metrics can be differentiated and systematized. Benjamin K. T'sou suggests that whilst men may evaluate machines, machines may also evaluate men. John S. White focuses on the question of the role of the user in evaluation design, and Yusoff Zaharin points out that circumstances and settings may have a major influence on evaluation design.

1998

pdf
Automated Text Summarization and the Summarist System
Eduard Hovy | Chin-Yew Lin
TIPSTER TEXT PROGRAM PHASE III: Proceedings of a Workshop held at Baltimore, Maryland, October 13-15, 1998

Multilingual text summarization
Eduard Hovy | Danel Marcu
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Tutorial Descriptions

bib
A seal of approval for MT systems
Eduard Hovy
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Panel Descriptions

pdf abs
Improving translation quality by manipulating sentence length
Laurie Gerber | Eduard Hovy
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers

Translation systems tend to have more trouble with long sentences than with short ones for a variety of reasons. When the source and target languages differ rather markedly, as do Japanese and English, this problem is reflected in lower quality output. To improve readability, we experimented with automatically splitting long sentences into shorter ones. This paper outlines the problem, describes the sentence splitting procedure and rules, and provides an evaluation of the results.

pdf bib
Natural Language Generation
Eduard Hovy
Natural Language Generation

1997

pdf
Identifying Topics by Position
Chin-Yew Lin | Eduard Hovy
Fifth Conference on Applied Natural Language Processing

bib abs
A gentle introduction to MT: theory and current practice
Eduard Hovy
Proceedings of Machine Translation Summit VI: Tutorials

This tutorial provides a nontechnical introduction to machine translation. It reviews the whole scope of MT, outlining briefly its history and the major application areas today, and describing the various kinds of MT techniques that have been invented—from direct replacement through transfer to the holy grail of interlinguas. It briefly outlines the newest statistics-based techniques and provides an introduction to the difficult questions of MT evaluation. Topics include: History and development of MT; Theoretical foundations of MT; Traditional and modern MT techniques; Newest MT research; Thorny questions of evaluating MT systems

pdf
Improving the precision of lexicon-to-ontology alignment algorithms
Latifur R. Khan | Eduard H. Hovy
AMTA/SIG-IL First Workshop on Interlinguas

pdf
MT at the paragraph level: improving English synthesis in SYSTRAN
Eduard Hovy | Laurie Gerber
Proceedings of the 7th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

pdf
Automated Text Summarization in SUMMARIST
Eduard Hovy | ChinYew Lin
Intelligent Scalable Text Summarization