Hal Daumé III

Also published as: Hal Daume, Hal Daume III, Hal Daumé


2023

pdf
FairPrism: Evaluating Fairness-Related Harms in Text Generation
Eve Fleisig | Aubrie Amstutz | Chad Atalla | Su Lin Blodgett | Hal Daumé III | Alexandra Olteanu | Emily Sheng | Dan Vann | Hanna Wallach
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

It is critical to measure and mitigate fairness-related harms caused by AI text generation systems, including stereotyping and demeaning harms. To that end, we introduce FairPrism, a dataset of 5,000 examples of AI-generated English text with detailed human annotations covering a diverse set of harms relating to gender and sexuality. FairPrism aims to address several limitations of existing datasets for measuring and mitigating fairness-related harms, including improved transparency, clearer specification of dataset coverage, and accounting for annotator disagreement and harms that are context-dependent. FairPrism’s annotations include the extent of stereotyping and demeaning harms, the demographic groups targeted, and appropriateness for different applications. The annotations also include specific harms that occur in interactive contexts and harms that raise normative concerns when the “speaker” is an AI system. Due to its precision and granularity, FairPrism can be used to diagnose (1) the types of fairness-related harms that AI text generation systems cause, and (2) the potential limitations of mitigation methods, both of which we illustrate through case studies. Finally, the process we followed to develop FairPrism offers a recipe for building improved datasets for measuring and mitigating harms caused by AI systems.

pdf
Factual or Contextual? Disentangling Error Types in Entity Description Generation
Navita Goyal | Ani Nenkova | Hal Daumé III
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In the task of entity description generation, given a context and a specified entity, a model must describe that entity correctly and in a contextually-relevant way. In this task, as well as broader language generation tasks, the generation of a nonfactual description (factual error) versus an incongruous description (contextual error) is fundamentally different, yet often conflated. We develop an evaluation paradigm that enables us to disentangle these two types of errors in naturally occurring textual contexts. We find that factuality and congruity are often at odds, and that models specifically struggle with accurate descriptions of entities that are less familiar to people. This shortcoming of language models raises concerns around the trustworthiness of such models, since factual errors on less well-known entities are exactly those that a human reader will not recognize.

pdf
It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance
Arjun Subramonian | Xingdi Yuan | Hal Daumé III | Su Lin Blodgett
Findings of the Association for Computational Linguistics: ACL 2023

Progress in NLP is increasingly measured through benchmarks; hence, contextualizing progress requires understanding when and why practitioners may disagree about the validity of benchmarks. We develop a taxonomy of disagreement, drawing on tools from measurement modeling, and distinguish between two types of disagreement: 1) how tasks are conceptualized and 2) how measurements of model performance are operationalized. To provide evidence for our taxonomy, we conduct a meta-analysis of relevant literature to understand how NLP tasks are conceptualized, as well as a survey of practitioners about their impressions of different factors that affect benchmark validity. Our meta-analysis and survey across eight tasks, ranging from coreference resolution to question answering, uncover that tasks are generally not clearly and consistently conceptualized and benchmarks suffer from operationalization disagreements. These findings support our proposed taxonomy of disagreement. Finally, based on our taxonomy, we present a framework for constructing benchmarks and documenting their limitations.

pdf
Define, Evaluate, and Improve Task-Oriented Cognitive Capabilities for Instruction Generation Models
Lingjun Zhao | Khanh Nguyen | Hal Daumé III
Findings of the Association for Computational Linguistics: ACL 2023

Recent work studies the cognitive capabilities of language models through psychological tests designed for humans. While these studies are helpful for understanding the general capabilities of these models, there is no guarantee that a model possessing sufficient capabilities to pass those tests would actually use those capabilities in performing real-life tasks. In this work, we formulate task-oriented cognitive capabilities, which are human-like cognitive capabilities that language models leverage to perform tasks. These capabilities are (i) the ability to quickly generate good candidate utterances (the search capability) (ii) the ability to predict how a listener interprets those utterances and choose the most appropriate one (the pragmatic capability). We design an evaluation scheme for comparing these capabilities of a language model with those of a human. Applying this scheme to examine various models in a navigation instruction generation problem, we find that their pragmatic capability is severely lacking. This insight leads us to augment them with better models of the listener and obtain a significant boost of 11% in success rate in guiding real humans. Our work advocates for having a principled procedure for aligning language models with humans that involves (i) formulating task-oriented capabilities, (ii) devising a method to quantify their deficiency, and (iii) iteratively improving them.

pdf
Which Examples Should be Multiply Annotated? Active Learning When Annotators May Disagree
Connor Baumler | Anna Sotnikova | Hal Daumé III
Findings of the Association for Computational Linguistics: ACL 2023

Linguistic annotations, especially for controversial topics like hate speech detection, are frequently contested due to annotator backgrounds and positionalities. In such situations, preserving this disagreement through the machine learning pipeline can be important for downstream use cases. However, capturing disagreement can increase annotation time and expense. Fortunately, for many tasks, not all examples are equally controversial; we develop an active learning approach, Disagreement Aware Active Learning (DAAL) that concentrates annotations on examples where model entropy and annotator entropy are the most different. Because we cannot know the true entropy of annotations on unlabeled examples, we estimate a model that predicts annotator entropy trained using very few multiply-labeled examples. We find that traditional uncertainty-based active learning underperforms simple passive learning on tasks with high levels of disagreement, but that our active learning approach is able to successfully improve on passive and active baselines, reducing the number of annotations required by at least 24% on average across several datasets.

2022

pdf
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications
Kaitlyn Zhou | Su Lin Blodgett | Adam Trischler | Hal Daumé III | Kaheer Suleman | Alexandra Olteanu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

There are many ways to express similar things in text, which makes evaluating natural language generation (NLG) systems difficult. Compounding this difficulty is the need to assess varying quality criteria depending on the deployment setting. While the landscape of NLG evaluation has been well-mapped, practitioners’ goals, assumptions, and constraints—which inform decisions about what, when, and how to evaluate—are often partially or implicitly stated, or not stated at all. Combining a formative semi-structured interview study of NLG practitioners (N=18) with a survey study of a broader sample of practitioners (N=61), we surface goals, community practices, assumptions, and constraints that shape NLG evaluations, examining their implications and how they embody ethical considerations.

pdf
Theory-Grounded Measurement of U.S. Social Stereotypes in English Language Models
Yang Trista Cao | Anna Sotnikova | Hal Daumé III | Rachel Rudinger | Linda Zou
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

NLP models trained on text have been shown to reproduce human stereotypes, which can magnify harms to marginalized groups when systems are deployed at scale. We adapt the Agency-Belief-Communion (ABC) stereotype model of Koch et al. (2016) from social psychology as a framework for the systematic study and discovery of stereotypic group-trait associations in language models (LMs). We introduce the sensitivity test (SeT) for measuring stereotypical associations from language models. To evaluate SeT and other measures using the ABC model, we collect group-trait judgments from U.S.-based subjects to compare with English LM stereotypes. Finally, we extend this framework to measure LM stereotyping of intersectional identities.

pdf bib
Proceedings of the Second Workshop on Bridging Human--Computer Interaction and Natural Language Processing
Su Lin Blodgett | Hal Daumé III | Michael Madaio | Ani Nenkova | Brendan O'Connor | Hanna Wallach | Qian Yang
Proceedings of the Second Workshop on Bridging Human--Computer Interaction and Natural Language Processing

pdf
Heterogeneous Supervised Topic Models
Dhanya Sridhar | Hal Daumé III | David Blei
Transactions of the Association for Computational Linguistics, Volume 10

Researchers in the social sciences are often interested in the relationship between text and an outcome of interest, where the goal is to both uncover latent patterns in the text and predict outcomes for unseen texts. To this end, this paper develops the heterogeneous supervised topic model (HSTM), a probabilistic approach to text analysis and prediction. HSTMs posit a joint model of text and outcomes to find heterogeneous patterns that help with both text analysis and prediction. The main benefit of HSTMs is that they capture heterogeneity in the relationship between text and the outcome across latent topics. To fit HSTMs, we develop a variational inference algorithm based on the auto-encoding variational Bayes framework. We study the performance of HSTMs on eight datasets and find that they consistently outperform related methods, including fine-tuned black-box models. Finally, we apply HSTMs to analyze news articles labeled with pro- or anti-tone. We find evidence of differing language used to signal a pro- and anti-tone.

pdf
What’s Different between Visual Question Answering for Machine “Understanding” Versus for Accessibility?
Yang Trista Cao | Kyle Seelman | Kyungjun Lee | Hal Daumé III
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In visual question answering (VQA), a machine must answer a question given an associated image. Recently, accessibility researchers have explored whether VQA can be deployed in a real-world setting where users with visual impairments learn about their environment by capturing their visual surroundings and asking questions. However, most of the existing benchmarking datasets for VQA focus on machine “understanding” and it remains unclear how progress on those datasets corresponds to improvements in this real-world use case. We aim to answer this question by evaluating discrepancies between machine “understanding” datasets (VQA-v2) and accessibility datasets (VizWiz) by evaluating a variety of VQA models. Based on our findings, we discuss opportunities and challenges in VQA for accessibility and suggest directions for future work.

2021

pdf
Analyzing Stereotypes in Generative Text Inference Tasks
Anna Sotnikova | Yang Trista Cao | Hal Daumé III | Rachel Rudinger
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf
Distantly-Supervised Dense Retrieval Enables Open-Domain Question Answering without Evidence Annotation
Chen Zhao | Chenyan Xiong | Jordan Boyd-Graber | Hal Daumé III
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Open-domain question answering answers a question based on evidence retrieved from a large corpus. State-of-the-art neural approaches require intermediate evidence annotations for training. However, such intermediate annotations are expensive, and methods that rely on them cannot transfer to the more common setting, where only question–answer pairs are available. This paper investigates whether models can learn to find evidence from a large corpus, with only distant supervision from answer labels for model training, thereby generating no additional annotation cost. We introduce a novel approach (DistDR) that iteratively improves over a weak retriever by alternately finding evidence from the up-to-date model and encouraging the model to learn the most likely evidence. Without using any evidence labels, DistDR is on par with fully-supervised state-of-the-art methods on both multi-hop and single-hop QA benchmarks. Our analysis confirms that DistDR finds more accurate evidence over iterations, which leads to model improvements. The code is available at https://github.com/henryzhao5852/DistDR.

pdf
Multi-Step Reasoning Over Unstructured Text with Beam Dense Retrieval
Chen Zhao | Chenyan Xiong | Jordan Boyd-Graber | Hal Daumé III
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Complex question answering often requires finding a reasoning chain that consists of multiple evidence pieces. Current approaches incorporate the strengths of structured knowledge and unstructured text, assuming text corpora is semi-structured. Building on dense retrieval methods, we propose a new multi-step retrieval approach (BeamDR) that iteratively forms an evidence chain through beam search in dense representations. When evaluated on multi-hop question answering, BeamDR is competitive to state-of-the-art systems, without using any semi-structured information. Through query composition in dense space, BeamDR captures the implicit relationships between evidence in the reasoning chain. The code is available at https://github.com/ henryzhao5852/BeamDR.

pdf
Toward Gender-Inclusive Coreference Resolution: An Analysis of Gender and Bias Throughout the Machine Learning Lifecycle*
Yang Trista Cao | Hal Daumé III
Computational Linguistics, Volume 47, Issue 3 - November 2021

Correctly resolving textual mentions of people fundamentally entails making inferences about those people. Such inferences raise the risk of systematic biases in coreference resolution systems, including biases that can harm binary and non-binary trans and cis stakeholders. To better understand such biases, we foreground nuanced conceptualizations of gender from sociology and sociolinguistics, and investigate where in the machine learning pipeline such biases can enter a coreference resolution system. We inspect many existing data sets for trans-exclusionary biases, and develop two new data sets for interrogating bias in both crowd annotations and in existing coreference resolution systems. Through these studies, conducted on English text, we confirm that without acknowledging and building systems that recognize the complexity of gender, we will build systems that fail for: quality of service, stereotyping, and over- or under-representation, especially for binary and non-binary trans users.

2020

pdf
Active Imitation Learning with Noisy Guidance
Kianté Brantley | Amr Sharaf | Hal Daumé III
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Imitation learning algorithms provide state-of-the-art results on many structured prediction tasks by learning near-optimal search policies. Such algorithms assume training-time access to an expert that can provide the optimal action at any queried state; unfortunately, the number of such queries is often prohibitive, frequently rendering these approaches impractical. To combat this query complexity, we consider an active learning setting in which the learning algorithm has additional access to a much cheaper noisy heuristic that provides noisy guidance. Our algorithm, LEAQI, learns a difference classifier that predicts when the expert is likely to disagree with the heuristic, and queries the expert only when necessary. We apply LEAQI to three sequence labelling tasks, demonstrating significantly fewer queries to the expert and comparable (or better) accuracies over a passive approach.

pdf
Toward Gender-Inclusive Coreference Resolution
Yang Trista Cao | Hal Daumé III
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Correctly resolving textual mentions of people fundamentally entails making inferences about those people. Such inferences raise the risk of systemic biases in coreference resolution systems, including biases that can harm binary and non-binary trans and cis stakeholders. To better understand such biases, we foreground nuanced conceptualizations of gender from sociology and sociolinguistics, and develop two new datasets for interrogating bias in crowd annotations and in existing coreference resolution systems. Through these studies, conducted on English text, we confirm that without acknowledging and building systems that recognize the complexity of gender, we build systems that lead to many potential harms.

pdf
Language (Technology) is Power: A Critical Survey of “Bias” in NLP
Su Lin Blodgett | Solon Barocas | Hal Daumé III | Hanna Wallach
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We survey 146 papers analyzing “bias” in NLP systems, finding that their motivations are often vague, inconsistent, and lacking in normative reasoning, despite the fact that analyzing “bias” is an inherently normative process. We further find that these papers’ proposed quantitative techniques for measuring or mitigating “bias” are poorly matched to their motivations and do not engage with the relevant literature outside of NLP. Based on these findings, we describe the beginnings of a path forward by proposing three recommendations that should guide work analyzing “bias” in NLP systems. These recommendations rest on a greater recognition of the relationships between language and social hierarchies, encouraging researchers and practitioners to articulate their conceptualizations of “bias”—i.e., what kinds of system behaviors are harmful, in what ways, to whom, and why, as well as the normative reasoning underlying these statements—and to center work around the lived experiences of members of communities affected by NLP systems, while interrogating and reimagining the power relations between technologists and such communities.

pdf
Meta-Learning for Few-Shot NMT Adaptation
Amr Sharaf | Hany Hassan | Hal Daumé III
Proceedings of the Fourth Workshop on Neural Generation and Translation

We present META-MT, a meta-learning approach to adapt Neural Machine Translation (NMT) systems in a few-shot setting. META-MT provides a new approach to make NMT models easily adaptable to many target do- mains with the minimal amount of in-domain data. We frame the adaptation of NMT systems as a meta-learning problem, where we learn to adapt to new unseen domains based on simulated offline meta-training domain adaptation tasks. We evaluate the proposed meta-learning strategy on ten domains with general large scale NMT systems. We show that META-MT significantly outperforms classical domain adaptation when very few in- domain examples are available. Our experiments shows that META-MT can outperform classical fine-tuning by up to 2.5 BLEU points after seeing only 4, 000 translated words (300 parallel sentences).

pdf
On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries
Tianze Shi | Chen Zhao | Jordan Boyd-Graber | Hal Daumé III | Lillian Lee
Findings of the Association for Computational Linguistics: EMNLP 2020

Large-scale semantic parsing datasets annotated with logical forms have enabled major advances in supervised approaches. But can richer supervision help even more? To explore the utility of fine-grained, lexical-level supervision, we introduce SQUALL, a dataset that enriches 11,276 WIKITABLEQUESTIONS English-language questions with manually created SQL equivalents plus alignments between SQL and question fragments. Our annotation enables new training possibilities for encoderdecoder models, including approaches from machine translation previously precluded by the absence of alignments. We propose and test two methods: (1) supervised attention; (2) adopting an auxiliary objective of disambiguating references in the input queries to table columns. In 5-fold cross validation, these strategies improve over strong baselines by 4.4% execution accuracy. Oracle experiments suggest that annotated alignments can support further accuracy gains of up to 23.9%.

2019

pdf
Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning
Khanh Nguyen | Hal Daumé III
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Mobile agents that can leverage help from humans can potentially accomplish more complex tasks than they could entirely on their own. We develop “Help, Anna!” (HANNA), an interactive photo-realistic simulator in which an agent fulfills object-finding tasks by requesting and interpreting natural language-and-vision assistance. An agent solving tasks in a HANNA environment can leverage simulated human assistants, called ANNA (Automatic Natural Navigation Assistants), which, upon request, provide natural language and visual instructions to direct the agent towards the goals. To address the HANNA problem, we develop a memory-augmented neural agent that hierarchically models multiple levels of decision-making, and an imitation learning algorithm that teaches the agent to avoid repeating past mistakes while simultaneously predicting its own chances of making future progress. Empirically, our approach is able to ask for help more effectively than competitive baselines and, thus, attains higher task success rate on both previously seen and previously unseen environments.

pdf
Comparing and Developing Tools to Measure the Readability of Domain-Specific Texts
Elissa Redmiles | Lisa Maszkiewicz | Emily Hwang | Dhruv Kuchhal | Everest Liu | Miraida Morales | Denis Peskov | Sudha Rao | Rock Stevens | Kristina Gligorić | Sean Kross | Michelle Mazurek | Hal Daumé III
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

The readability of a digital text can influence people’s ability to learn new things about a range topics from digital resources (e.g., Wikipedia, WebMD). Readability also impacts search rankings, and is used to evaluate the performance of NLP systems. Despite this, we lack a thorough understanding of how to validly measure readability at scale, especially for domain-specific texts. In this work, we present a comparison of the validity of well-known readability measures and introduce a novel approach, Smart Cloze, which is designed to address shortcomings of existing measures. We compare these approaches across four different corpora: crowdworker-generated stories, Wikipedia articles, security and privacy advice, and health information. On these corpora, we evaluate the convergent and content validity of each measure, and detail tradeoffs in score precision, domain-specificity, and participant burden. These results provide a foundation for more accurate readability measurements and better evaluation of new natural-language-processing systems and tools.

pdf
Global Voices: Crossing Borders in Automatic News Summarization
Khanh Nguyen | Hal Daumé III
Proceedings of the 2nd Workshop on New Frontiers in Summarization

We construct Global Voices, a multilingual dataset for evaluating cross-lingual summarization methods. We extract social-network descriptions of Global Voices news articles to cheaply collect evaluation data for into-English and from-English summarization in 15 languages. Especially, for the into-English summarization task, we crowd-source a high-quality evaluation dataset based on guidelines that emphasize accuracy, coverage, and understandability. To ensure the quality of this dataset, we collect human ratings to filter out bad summaries, and conduct a survey on humans, which shows that the remaining summaries are preferred over the social-network summaries. We study the effect of translation quality in cross-lingual summarization, comparing a translate-then-summarize approach with several baselines. Our results highlight the limitations of the ROUGE metric that are overlooked in monolingual summarization.

pdf
Answer-based Adversarial Training for Generating Clarification Questions
Sudha Rao | Hal Daumé III
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We present an approach for generating clarification questions with the goal of eliciting new information that would make the given textual context more complete. We propose that modeling hypothetical answers (to clarification questions) as latent variables can guide our approach into generating more useful clarification questions. We develop a Generative Adversarial Network (GAN) where the generator is a sequence-to-sequence model and the discriminator is a utility function that models the value of updating the context with the answer to the clarification question. We evaluate on two datasets, using both automatic metrics and human judgments of usefulness, specificity and relevance, showing that our approach outperforms both a retrieval-based model and ablations that exclude the utility model and the adversarial training.


Controlling the Specificity of Clarification Question Generation
Yang Trista Cao | Sudha Rao | Hal Daumé III
Proceedings of the 2019 Workshop on Widening NLP

Unlike comprehension-style questions, clarification questions look for some missing information in a given context. However, without guidance, neural models for question generation, similar to dialog generation models, lead to generic and bland questions that cannot elicit useful information. We argue that controlling the level of specificity of the generated questions can have useful applications and propose a neural clarification question generation model for the same. We first train a classifier that annotates a clarification question with its level of specificity (generic or specific) to the given context. Our results on the Amazon questions dataset demonstrate that training a clarification question generation model on specificity annotated data can generate questions with varied levels of specificity to the given context.


Non-Monotonic Sequential Text Generation
Kiante Brantley | Kyunghyun Cho | Hal Daumé | Sean Welleck
Proceedings of the 2019 Workshop on Widening NLP

Standard sequential generation methods assume a pre-specified generation order, such as text generation methods which generate words from left to right. In this work, we propose a framework for training models of text generation that operate in non-monotonic orders; the model directly learns good orders, without any additional annotation. Our framework operates by generating a word at an arbitrary position, and then recursively generating words to its left and then words to its right, yielding a binary tree. Learning is framed as imitation learning, including a coaching method which moves from imitating an oracle to reinforcing the policy’s own preferences. Experimental results demonstrate that using the proposed method, it is possible to learn policies which generate text without pre-specifying a generation order while achieving competitive performance with conventional left-to-right generation.

2018

pdf
Expert, Crowdsourced, and Machine Assessment of Suicide Risk via Online Postings
Han-Chin Shing | Suraj Nair | Ayah Zirikly | Meir Friedenberg | Hal Daumé III | Philip Resnik
Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic

We report on the creation of a dataset for studying assessment of suicide risk via online postings in Reddit. Evaluation of risk-level annotations by experts yields what is, to our knowledge, the first demonstration of reliability in risk assessment by clinicians based on social media postings. We also introduce and demonstrate the value of a new, detailed rubric for assessing suicide risk, compare crowdsourced with expert performance, and present baseline predictive modeling experiments using the new dataset, which will be made available to researchers through the American Association of Suicidology.

pdf
Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information
Sudha Rao | Hal Daumé III
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Inquiry is fundamental to communication, and machines cannot effectively collaborate with humans unless they can ask questions. In this work, we build a neural network model for the task of ranking clarification questions. Our model is inspired by the idea of expected value of perfect information: a good question is one whose expected answer will be useful. We study this problem using data from StackExchange, a plentiful online resource in which people routinely ask clarifying questions to posts so that they can better offer assistance to the original poster. We create a dataset of clarification questions consisting of 77K posts paired with a clarification question (and answer) from three domains of StackExchange: askubuntu, unix and superuser. We evaluate our model on 500 samples of this dataset against expert human judgments and demonstrate significant improvements over controlled baselines.

pdf
Content Selection in Deep Learning Models of Summarization
Chris Kedzie | Kathleen McKeown | Hal Daumé III
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We carry out experiments with deep learning models of summarization across the domains of news, personal stories, meetings, and medical articles in order to understand how content selection is performed. We find that many sophisticated features of state of the art extractive summarizers do not improve performance over simpler models. These results suggest that it is easier to create a summarizer for a new domain than previous work suggests and bring into question the benefit of deep learning models for summarization for those domains that do have massive datasets (i.e., news). At the same time, they suggest important questions for new research in summarization; namely, new forms of sentence representations or external knowledge sources are needed that are better suited to the sumarization task.

2017

pdf
Biomedical Event Extraction using Abstract Meaning Representation
Sudha Rao | Daniel Marcu | Kevin Knight | Hal Daumé III
BioNLP 2017

We propose a novel, Abstract Meaning Representation (AMR) based approach to identifying molecular events/interactions in biomedical text. Our key contributions are: (1) an empirical validation of our hypothesis that an event is a subgraph of the AMR graph, (2) a neural network-based model that identifies such an event subgraph given an AMR, and (3) a distant supervision based approach to gather additional training data. We evaluate our approach on the 2013 Genia Event Extraction dataset and show promising results.

pdf
Structured Prediction via Learning to Search under Bandit Feedback
Amr Sharaf | Hal Daumé III
Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing

We present an algorithm for structured prediction under online bandit feedback. The learner repeatedly predicts a sequence of actions, generating a structured output. It then observes feedback for that output and no others. We consider two cases: a pure bandit setting in which it only observes a loss, and more fine-grained feedback in which it observes a loss for every action. We find that the fine-grained feedback is necessary for strong empirical performance, because it allows for a robust variance-reduction strategy. We empirically compare a number of different algorithms and exploration methods and show the efficacy of BLS on sequence labeling and dependency parsing tasks.

pdf
The UMD Neural Machine Translation Systems at WMT17 Bandit Learning Task
Amr Sharaf | Shi Feng | Khanh Nguyen | Kianté Brantley | Hal Daumé III
Proceedings of the Second Conference on Machine Translation

pdf bib
Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems
Emily Bender | Hal Daumé III | Allyson Ettinger | Sudha Rao
Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems

pdf bib
Towards Linguistically Generalizable NLP Systems: A Workshop and Shared Task
Allyson Ettinger | Sudha Rao | Hal Daumé III | Emily M. Bender
Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems

This paper presents a summary of the first Workshop on Building Linguistically Generalizable Natural Language Processing Systems, and the associated Build It Break It, The Language Edition shared task. The goal of this workshop was to bring together researchers in NLP and linguistics with a carefully designed shared task aimed at testing the generalizability of NLP systems beyond the distributions of their training data. We describe the motivation, setup, and participation of the shared task, provide discussion of some highlighted results, and discuss lessons learned.

pdf
Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback
Khanh Nguyen | Hal Daumé III | Jordan Boyd-Graber
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Machine translation is a natural candidate problem for reinforcement learning from human feedback: users provide quick, dirty ratings on candidate translations to guide a system to improve. Yet, current neural machine translation training focuses on expensive human-generated reference translations. We describe a reinforcement learning algorithm that improves neural machine translation systems from simulated human feedback. Our algorithm combines the advantage actor-critic algorithm (Mnih et al., 2016) with the attention-based neural encoder-decoder architecture (Luong et al., 2015). This algorithm (a) is well-designed for problems with a large action space and delayed rewards, (b) effectively optimizes traditional corpus-level machine translation metrics, and (c) is robust to skewed, high-variance, granular feedback modeled after actual human behaviors.

2016

pdf
CLIP@UMD at SemEval-2016 Task 8: Parser for Abstract Meaning Representation using Learning to Search
Sudha Rao | Yogarshi Vyas | Hal Daumé III | Philip Resnik
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf
Interpretese vs. Translationese: The Uniqueness of Human Strategies in Simultaneous Interpretation
He He | Jordan Boyd-Graber | Hal Daumé III
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Feuding Families and Former Friends: Unsupervised Learning for Dynamic Fictional Relationships
Mohit Iyyer | Anupam Guha | Snigdha Chaturvedi | Jordan Boyd-Graber | Hal Daumé III
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Learning Text Pair Similarity with Context-sensitive Autoencoders
Hadi Amiri | Philip Resnik | Jordan Boyd-Graber | Hal Daumé III
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the Workshop on Human-Computer Question Answering
Mohit Iyyer | He He | Jordan Boyd-Graber | Hal Daumé III
Proceedings of the Workshop on Human-Computer Question Answering

pdf
The UMD CLPsych 2016 Shared Task System: Text Representation for Predicting Triage of Forum Posts about Mental Health
Meir Friedenberg | Hadi Amiri | Hal Daumé III | Philip Resnik
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology

pdf
A Framework for Discriminative Rule Selection in Hierarchical Moses
Fabienne Braune | Alexander Fraser | Hal Daumé III | Aleš Tamchyna
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers

2015

pdf
Dialogue focus tracking for zero pronoun resolution
Sudha Rao | Allyson Ettinger | Hal Daumé III | Philip Resnik
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Hands-on Learning to Search for Structured Prediction
Hal Daumé III | John Langford | Kai-Wei Chang | He He | Sudha Rao
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts

pdf
Why discourse affects speakers’ choice of referring expressions
Naho Orita | Eliana Vornov | Naomi Feldman | Hal Daumé III
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf
Deep Unordered Composition Rivals Syntactic Methods for Text Classification
Mohit Iyyer | Varun Manjunatha | Jordan Boyd-Graber | Hal Daumé III
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf
Syntax-based Rewriting for Simultaneous Machine Translation
He He | Alvin Grissom II | John Morgan | Jordan Boyd-Graber | Hal Daumé III
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf
I Object!” Modeling Latent Pragmatic Effects in Courtroom Dialogues
Dan Goldwasser | Hal Daumé III
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf
A Neural Network for Factoid Question Answering over Paragraphs
Mohit Iyyer | Jordan Boyd-Graber | Leonardo Claudino | Richard Socher | Hal Daumé III
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf
Don’t Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation
Alvin Grissom II | He He | Jordan Boyd-Graber | John Morgan | Hal Daumé III
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf
Understanding MOOC Discussion Forums using Seeded LDA
Arti Ramesh | Dan Goldwasser | Bert Huang | Hal Daumé | Lise Getoor
Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications

pdf
A Unified Model for Soft Linguistic Reordering Constraints in Statistical Machine Translation
Junhui Li | Yuval Marton | Philip Resnik | Hal Daumé III
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Predicting Instructor’s Intervention in MOOC forums
Snigdha Chaturvedi | Dan Goldwasser | Hal Daumé III
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf
Measuring Machine Translation Errors in New Domains
Ann Irvine | John Morgan | Marine Carpuat | Hal Daumé III | Dragos Munteanu
Transactions of the Association for Computational Linguistics, Volume 1

We develop two techniques for analyzing the effect of porting a machine translation system to a new domain. One is a macro-level analysis that measures how domain shift affects corpus-level evaluation; the second is a micro-level analysis for word-level errors. We apply these methods to understand what happens when a Parliament-trained phrase-based machine translation system is applied in four very different domains: news, medical texts, scientific articles and movie subtitles. We present quantitative and qualitative experiments that highlight opportunities for future research in domain adaptation for machine translation.

pdf
Monolingual Marginal Matching for Translation Model Adaptation
Ann Irvine | Chris Quirk | Hal Daumé III
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf
Dynamic Feature Selection for Dependency Parsing
He He | Hal Daumé III | Jason Eisner
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Lucy Vanderwende | Hal Daumé III | Katrin Kirchhoff
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Modeling Syntactic and Semantic Structures in Hierarchical Phrase-based Translation
Junhui Li | Philip Resnik | Hal Daumé III
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
SenseSpotting: Never let your parallel data tie you to an old domain
Marine Carpuat | Hal Daumé III | Katharine Henry | Ann Irvine | Jagadeesh Jagarlamudi | Rachel Rudinger
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
Domain Adaptation in Machine Translation: Findings from the 2012 Johns Hopkins Summer Workshop
Hal Daumé III | Marine Carpuat | Alex Fraser | Chris Quirk
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Keynote Presentations

pdf bib
Regularized Interlingual Projections: Evaluation on Multilingual Transliteration
Jagadeesh Jagarlamudi | Hal Daumé III
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf
Fast Large-Scale Approximate Graph Construction for NLP
Amit Goyal | Hal Daumé III | Raul Guerra
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf
Sketch Algorithms for Estimating Point Queries in NLP
Amit Goyal | Hal Daumé III | Graham Cormode
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf
Besting the Quiz Master: Crowdsourcing Incremental Classification Games
Jordan Boyd-Graber | Brianna Satinoff | He He | Hal Daumé III
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf
Low-Dimensional Discriminative Reranking
Jagadeesh Jagarlamudi | Hal Daumé III
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Detecting Visual Text
Jesse Dodge | Amit Goyal | Xufeng Han | Alyssa Mensch | Margaret Mitchell | Karl Stratos | Kota Yamaguchi | Yejin Choi | Hal Daumé III | Alex Berg | Tamara Berg
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Proceedings of the First Workshop on Multilingual Modeling
Jagadeesh Jagarlamudi | Sujith Ravi | Xiaojun Wan | Hal Daume III
Proceedings of the First Workshop on Multilingual Modeling

pdf
Incorporating Lexical Priors into Topic Models
Jagadeesh Jagarlamudi | Hal Daumé III | Raghavendra Udupa
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Midge: Generating Image Descriptions From Computer Vision Detections
Margaret Mitchell | Jesse Dodge | Amit Goyal | Kota Yamaguchi | Karl Stratos | Xufeng Han | Alyssa Mensch | Alex Berg | Tamara Berg | Hal Daumé III
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf
Generating Semantic Orientation Lexicon using Large Data and Thesaurus
Amit Goyal | Hal Daumé
Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011)

pdf
Approximate Scalable Bounded Space Sketch for Large Data NLP
Amit Goyal | Hal Daumé III
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf
Corpus-Guided Sentence Generation of Natural Images
Yezhou Yang | Ching Teo | Hal Daumé III | Yiannis Aloimonos
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf
Improving Bilingual Projections via Sparse Covariance Matrices
Jagadeesh Jagarlamudi | Raghavendra Udupa | Hal Daumé III | Abhijit Bhole
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf
From Bilingual Dictionaries to Interlingual Document Representations
Jagadeesh Jagarlamudi | Hal Daumé III | Raghavendra Udupa
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf
Domain Adaptation for Machine Translation by Mining Unseen Words
Hal Daumé III | Jagadeesh Jagarlamudi
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Beyond Structured Prediction: Inverse Reinforcement Learning
Hal Daumé III
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

2010

pdf
Automatically Producing Plot Unit Representations for Narrative Text
Amit Goyal | Ellen Riloff | Hal Daumé III
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf
From Structured Prediction to Inverse Reinforcement Learning
Hal Daumé III
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

pdf
Domain Adaptation meets Active Learning
Piyush Rai | Avishek Saha | Hal Daumé | Suresh Venkatasubramanian
Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing

pdf
Toward Plot Units: Automatic Affect State Analysis
Amit Goyal | Ellen Riloff | Hal Daume III | Nathan Gilbert
Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text

pdf
Sketching Techniques for Large Scale NLP
Amit Goyal | Jagadeesh Jagarlamudi | Hal Daumé III | Suresh Venkatasubramanian
Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop

pdf bib
Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
Hal Daumé III | Tejaswini Deoskar | David McClosky | Barbara Plank | Jörg Tiedemann
Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing

pdf
Frustratingly Easy Semi-Supervised Domain Adaptation
Hal Daumé III | Abhishek Kumar | Avishek Saha
Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing

pdf
Sketch Techniques for Scaling Distributional Similarity to the Web
Amit Goyal | Jagadeesh Jagarlamudi | Hal Daumé III | Suresh Venkatasubramanian
Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics

2009

pdf
Streaming for large scale NLP: Language Modeling
Amit Goyal | Hal Daumé III | Suresh Venkatasubramanian
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf
Non-Parametric Bayesian Areal Linguistics
Hal Daumé III
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf
Markov Random Topic Fields
Hal Daumé III
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2008

pdf
Name Translation in Statistical Machine Translation - Learning When to Transliterate
Ulf Hermjakob | Kevin Knight | Hal Daumé III
Proceedings of ACL-08: HLT

pdf
Cross-Task Knowledge-Constrained Self Training
Hal Daumé III
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf
A Bayesian Model for Discovering Typological Implications
Hal Daumé III | Lyle Campbell
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf
Frustratingly Easy Domain Adaptation
Hal Daumé III
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf
Bayesian Query-Focused Summarization
Hal Daumé III | Daniel Marcu
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Beyond EM: Bayesian Techniques for Human Language Technology Researchers
Hal Daume III
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Tutorial Abstracts

pdf bib
Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing
Ryan McDonald | Charles Sutton | Hal Daumé III | Andrew McCallum | Fernando Pereira | Jeff Bilmes
Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing

2005

pdf
Induction of Word and Phrase Alignments for Automatic Document Summarization
Hal Daumé III | Daniel Marcu
Computational Linguistics, Volume 31, Number 4, December 2005

pdf
A Large-Scale Exploration of Effective Global Features for a Joint Entity Detection and Tracking Model
Hal Daumé III | Daniel Marcu
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf
Web Search Intent Induction via Automatic Query Reformulation
Hal Daumé III | Eric Brill
Proceedings of HLT-NAACL 2004: Short Papers

pdf
Generic Sentence Fusion is an Ill-Defined Summarization Task
Hal Daume III | Daniel Marcu
Text Summarization Branches Out

pdf
A Phrase-Based HMM Approach to Document/Abstract Alignment
Hal Daumé III | Daniel Marcu
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

pdf
NP Bracketing by Maximum Entropy Tagging and SVM Reranking
Hal Daumé III | Daniel Marcu
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

2002

pdf bib
The Importance of Lexicalized Syntax Models for Natural Language Generation Tasks
Hal Daume III | Kevin Knight | Irene Langkilde-Geary | Daniel Marcu | Kenji Yamada
Proceedings of the International Natural Language Generation Conference

pdf
A Noisy-Channel Model for Document Compression
Hal Daume III | Daniel Marcu
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

2001

pdf
Integrated Information Management: An Interactive, Extensible Architecture for Information Retrieval
Eric Nyberg | Hal Daume
Proceedings of the First International Conference on Human Language Technology Research

Search
Co-authors