Michael Gamon


2022

pdf
LITE: Intent-based Task Representation Learning Using Weak Supervision
Naoki Otani | Michael Gamon | Sujay Kumar Jauhar | Mei Yang | Sri Raghu Malireddi | Oriana Riva
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Users write to-dos as personal notes to themselves, about things they need to complete, remember or organize. To-do texts are usually short and under-specified, which poses a challenge for current text representation models. Yet, understanding and representing their meaning is the first step towards providing intelligent assistance for to-do management. We address this problem by proposing a neural multi-task learning framework, LITE, which extracts representations of English to-do tasks with a multi-head attention mechanism on top of a pre-trained text encoder. To adapt representation models to to-do texts, we collect weak-supervision labels from semantically rich external resources (e.g., dynamic commonsense knowledge bases), following the principle that to-do tasks with similar intents have similar labels. We then train the model on multiple generative/predictive training objectives jointly. We evaluate our representation model on four downstream tasks and show that our approach consistently improves performance over baseline models, achieving error reduction of up to 38.7%.

pdf
MS-LaTTE: A Dataset of Where and When To-do Tasks are Completed
Sujay Kumar Jauhar | Nirupama Chandrasekaran | Michael Gamon | Ryen White
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Tasks are a fundamental unit of work in the daily lives of people, who are increasingly using digital means to keep track of, organize, triage, and act on them. These digital tools – such as task management applications – provide a unique opportunity to study and understand tasks and their connection to the real world, and through intelligent assistance, help people be more productive. By logging signals such as text, timestamp information, and social connectivity graphs, an increasingly rich and detailed picture of how tasks are created and organized, what makes them important, and who acts on them, can be progressively developed. Yet the context around actual task completion remains fuzzy, due to the basic disconnect between actions taken in the real world and telemetry recorded in the digital world. Thus, in this paper we compile and release a novel, real-life, large-scale dataset called MS-LaTTE that captures two core aspects of the context surrounding task completion: location and time. We describe our annotation framework and conduct a number of analyses on the data that were collected, demonstrating that it captures intuitive contextual properties for common tasks. Finally, we test the dataset on the two problems of predicting spatial and temporal task co-occurrence, concluding that predictors for co-location and co-time are both learnable, with a BERT fine-tuned model outperforming several other baselines. The MS-LaTTE dataset provides an opportunity to tackle many new modeling challenges in contextual task understanding and we hope that its release will spur future research in task intelligence more broadly.

pdf
One Document, Many Revisions: A Dataset for Classification and Description of Edit Intents
Dheeraj Rajagopal | Xuchao Zhang | Michael Gamon | Sujay Kumar Jauhar | Diyi Yang | Eduard Hovy
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Document authoring involves a lengthy revision process, marked by individual edits that are frequently linked to comments. Modeling the relationship between edits and comments leads to a better understanding of document evolution, potentially benefiting applications such as content summarization, and task triaging. Prior work on understanding revisions has primarily focused on classifying edit intents, but falling short of a deeper understanding of the nature of these edits. In this paper, we present explore the challenge of describing an edit at two levels: identifying the edit intent, and describing the edit using free-form text. We begin by defining a taxonomy of general edit intents and introduce a new dataset of full revision histories of Wikipedia pages, annotated with each revision’s edit intent. Using this dataset, we train a classifier that achieves a 90% accuracy in identifying edit intent. We use this classifier to train a distantly-supervised model that generates a high-level description of a revision in free-form text. Our experimental results show that incorporating edit intent information aids in generating better edit descriptions. We establish a set of baselines for the edit description task, achieving a best score of 28 ROUGE, thus demonstrating the effectiveness of our layered approach to edit understanding.

2020

pdf
SemEval-2020 Task 7: Assessing Humor in Edited News Headlines
Nabil Hossain | John Krumm | Michael Gamon | Henry Kautz
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes the SemEval-2020 shared task “Assessing Humor in Edited News Headlines.” The task’s dataset contains news headlines in which short edits were applied to make them funny, and the funniness of these edited headlines was rated using crowdsourcing. This task includes two subtasks, the first of which is to estimate the funniness of headlines on a humor scale in the interval 0-3. The second subtask is to predict, for a pair of edited versions of the same original headline, which is the funnier version. To date, this task is the most popular shared computational humor task, attracting 48 teams for the first subtask and 31 teams for the second.

2019

pdf
Modeling the Relationship between User Comments and Edits in Document Revision
Xuchao Zhang | Dheeraj Rajagopal | Michael Gamon | Sujay Kumar Jauhar | ChangTien Lu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Management of collaborative documents can be difficult, given the profusion of edits and comments that multiple authors make during a document’s evolution. Reliably modeling the relationship between edits and comments is a crucial step towards helping the user keep track of a document in flux. A number of authoring tasks, such as categorizing and summarizing edits, detecting completed to-dos, and visually rearranging comments could benefit from such a contribution. Thus, in this paper we explore the relationship between comments and edits by defining two novel, related tasks: Comment Ranking and Edit Anchoring. We begin by collecting a dataset with more than half a million comment-edit pairs based on Wikipedia revision histories. We then propose a hierarchical multi-layer deep neural-network to model the relationship between edits and comments. Our architecture tackles both Comment Ranking and Edit Anchoring tasks by encoding specific edit actions such as additions and deletions, while also accounting for document context. In a number of evaluation settings, our experimental results show that our approach outperforms several strong baselines significantly. We are able to achieve a precision@1 of 71.0% and a precision@3 of 94.4% for Comment Ranking, while we achieve 74.4% accuracy on Edit Anchoring.

pdf
“President Vows to Cut <Taxes> Hair”: Dataset and Analysis of Creative Text Editing for Humorous Headlines
Nabil Hossain | John Krumm | Michael Gamon
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We introduce, release, and analyze a new dataset, called Humicroedit, for research in computational humor. Our publicly available data consists of regular English news headlines paired with versions of the same headlines that contain simple replacement edits designed to make them funny. We carefully curated crowdsourced editors to create funny headlines and judges to score a to a total of 15,095 edited headlines, with five judges per headline. The simple edits, usually just a single word replacement, mean we can apply straightforward analysis techniques to determine what makes our edited headlines humorous. We show how the data support classic theories of humor, such as incongruity, superiority, and setup/punchline. Finally, we develop baseline classifiers that can predict whether or not an edited headline is funny, which is a first step toward automatically generating humorous headlines as an approach to creating topical humor.

2016

pdf
Activity Modeling in Email
Ashequl Qadir | Michael Gamon | Patrick Pantel | Ahmed Hassan Awadallah
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf
Representing Text for Joint Embedding of Text and Knowledge Bases
Kristina Toutanova | Danqi Chen | Patrick Pantel | Hoifung Poon | Pallavi Choudhury | Michael Gamon
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM)
Atefeh Farzindar | Diana Inkpen | Michael Gamon | Meena Nagarajan
Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM)

pdf bib
Modeling Interestingness with Deep Neural Networks
Jianfeng Gao | Patrick Pantel | Michael Gamon | Xiaodong He | Li Deng
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf
Smart Selection
Patrick Pantel | Michael Gamon | Ariel Fuxman
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Predicting Interesting Things in Text
Michael Gamon | Arjun Mukherjee | Patrick Pantel
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Proceedings of the Workshop on Language Analysis in Social Media
Cristian Danescu-Niculescu-Mizil | Atefeh Farzindar | Michael Gamon | Diana Inkpen | Meena Nagarajan
Proceedings of the Workshop on Language Analysis in Social Media

pdf
Revisiting the Old Kitchen Sink: Do we Need Sentiment Domain Adaptation?
Riham Mansour | Nesma Refaei | Michael Gamon | Ahmed Abdul-Hamid | Khaled Sami
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

2012

pdf
Mining Entity Types from Query Logs via User Intent Modeling
Patrick Pantel | Thomas Lin | Michael Gamon
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Predicting Responses to Microblog Posts
Yoav Artzi | Patrick Pantel | Michael Gamon
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
MSR SPLAT, a language analysis toolkit
Chris Quirk | Pallavi Choudhury | Jianfeng Gao | Hisami Suzuki | Kristina Toutanova | Michael Gamon | Wen-tau Yih | Colin Cherry | Lucy Vanderwende
Proceedings of the Demonstration Session at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Underspecified Query Refinement via Natural Language Question Generation
Hassan Sajjad | Patrick Pantel | Michael Gamon
Proceedings of COLING 2012

pdf bib
Proceedings of the Second Workshop on Language in Social Media
Sara Owsley Sood | Meenakshi Nagarajan | Michael Gamon
Proceedings of the Second Workshop on Language in Social Media

2011

pdf bib
Proceedings of the Workshop on Language in Social Media (LSM 2011)
Meenakshi Nagarajan | Michael Gamon
Proceedings of the Workshop on Language in Social Media (LSM 2011)

pdf
High-Order Sequence Modeling for Language Learner Error Detection
Michael Gamon
Proceedings of the Sixth Workshop on Innovative Use of NLP for Building Educational Applications

pdf
MSR-NLP Entry in BioNLP Shared Task 2011
Chris Quirk | Pallavi Choudhury | Michael Gamon | Lucy Vanderwende
Proceedings of BioNLP Shared Task 2011 Workshop

2010

pdf
Using Mostly Native Data to Correct Errors in Learners’ Writing
Michael Gamon
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf
Search right and thou shalt find ... Using Web Queries for Learner Error Detection
Michael Gamon | Claudia Leacock
Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications

2009

pdf
User Input and Interactions on Microsoft Research ESL Assistant
Claudia Leacock | Michael Gamon | Chris Brockett
Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications

2008

pdf
Using Contextual Speller Techniques and Language Modeling for ESL Error Correction
Michael Gamon | Jianfeng Gao | Chris Brockett | Alexandre Klementiev | William B. Dolan | Dmitriy Belenko | Lucy Vanderwende
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

2007

pdf
Book Reviews: Computing Attitude and Affect in Text: Theory and Applications, edited by James G. Shanahan, Yan Qu, and Janyce Wiebe
Michael Gamon
Computational Linguistics, Volume 33, Number 2, June 2007

2006

pdf
Correcting ESL Errors Using Phrasal SMT Techniques
Chris Brockett | William B. Dolan | Michael Gamon
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf
Obfuscating Document Stylometry to Preserve Author Anonymity
Gary Kacmarcik | Michael Gamon
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Proceedings of the Workshop on Sentiment and Subjectivity in Text
Michael Gamon | Anthony Aue
Proceedings of the Workshop on Sentiment and Subjectivity in Text

pdf
Graph-Based Text Representation for Novelty Detection
Michael Gamon
Proceedings of TextGraphs: the First Workshop on Graph Based Methods for Natural Language Processing

2005

pdf
Automatic Identification of Sentiment Vocabulary: Exploiting Low Association with Known Sentiment Terms
Michael Gamon | Anthony Aue
Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing

pdf
Sentence-level MT evaluation without reference translations: beyond language modeling
Michael Gamon | Anthony Aue | Martine Smets
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

2004

pdf
Normalizing German and English inflectional morphology to improve statistical word alignment
Simon Corston-Oliver | Michael Gamon
Proceedings of the 6th Conference of the Association for Machine Translation in the Americas: Technical Papers

German has a richer system of inflectional morphology than English, which causes problems for current approaches to statistical word alignment. Using Giza++ as a reference implementation of the IBM Model 1, an HMMbased alignment and IBM Model 4, we measure the impact of normalizing inflectional morphology on German-English statistical word alignment. We demonstrate that normalizing inflectional morphology improves the perplexity of models and reduces alignment errors.

pdf
Task-Focused Summarization of Email
Simon Corston-Oliver | Eric Ringger | Michael Gamon | Richard Campbell
Text Summarization Branches Out

pdf
Linguistic correlates of style: authorship classification with deep linguistic analysis features
Michael Gamon
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf
Linguistically Informed Statistical Models of Constituent Structure for Ordering in Sentence Realization
Eric Ringger | Michael Gamon | Robert C. Moore | David Rojas | Martine Smets | Simon Corston-Oliver
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf
Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis
Michael Gamon
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf
Combining decision trees and transformation-based learning to correct transferred linguistic representations
Simon Corston-Oliver | Michael Gamon
Proceedings of Machine Translation Summit IX: Papers

We approach to correcting features in transferred linguistic representations in machine translation. The hybrid approach combines decision trees and transformation-based learning. Decision trees serve as a filter on the intractably large search space of possible interrelations among features. Transformation-based learning results in a simple set of ordered rules that can be compiled and executed after transfer and before sentence realization in the target language. We measure the reduction in noise in the linguistic representations and the results of human evaluations of end-to-end English-German machine translation.

pdf
High quality machine translation using a machine-learned sentence realization component
Martine Smets | Michael Gamon | Jessie Pinkham | Tom Reutter | Martine Pettenaro
Proceedings of Machine Translation Summit IX: Papers

We describe the implementation of two new language pairs (English-French and English-German) which use machine-learned sentence realization components instead of hand-written generation components. The resulting systems are evaluated by human evaluators, and in the technical domain, are equal to the quality of highly respected commercial systems. We comment on the difficulties that are encountered when using machine-learned sentence realization in the context of MT.

pdf
French Amalgam: a quick adaptation of a sentence realization system to French
Martine Smets | Michael Gamon | Simon Corston-Oliver | Eric Ringger
10th Conference of the European Chapter of the Association for Computational Linguistics

pdf
French Amalgam: A machine-learned sentence realization system
Martine Smets | Michael Gamon | Simon Corston-Oliver | Eric Ringger
Actes de la 10ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

This paper presents the French implementation of Amalgam, a machine-learned sentence realization system. It presents in some detail two of the machine-learned models employed in Amalgam and shows how linguistic intuition and knowledge can be combined with statistical techniques to improve the performance of the models.

2002

pdf
An Overview of Amalgam: A Machine-learned Generation Module
Simon Corston-Oliver | Michael Gamon | Eric Ringger | Robert Moore
Proceedings of the International Natural Language Generation Conference

pdf
Extraposition: A Case Study in German Sentence Realization
Michael Gamon | Eric Ringger | Zhu Zhang | Robert Moore | Simon Corston-Oliver
COLING 2002: The 19th International Conference on Computational Linguistics

pdf
Machine-learned contexts for linguistic operations in German sentence realization
Michael Gamon | Eric Ringger | Simon Corston-Oliver | Robert Moore
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

2001

pdf
A Machine Learning Approach to the Automatic Evaluation of Machine Translation
Simon Corston-Oliver | Michael Gamon | Chris Brockett
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

pdf
Using machine learning for system-internal evaluation of transferred linguistic representations
Michael Gamon | Hisami Suzuki | Simon Corston-Oliver
Proceedings of Machine Translation Summit VIII

We present an automated, system-internal evaluation technique for linguistic representations in a large-scale, multilingual MT system. We use machine-learned classifiers to recognize the differences between linguistic representations generated from transfer in an MT context from representations that are produced by "native" analysis of the target language. In the MT scenario, convergence of the two is the desired result. Holding the feature set and the learning algorithm constant, the accuracy of the classifiers provides a measure of the overall difference between the two sets of linguistic representations: classifiers with higher accuracy correspond to more pronounced differences between representations. More importantly, the classifiers yield the basis for error-analysis by providing a ranking of the importance of linguistic features. The more salient a linguistic criterion is in discriminating transferred representations from "native" representations, the more work will be needed in order to get closer to the goal of producing native-like MT. We present results from using this approach on the Microsoft MT system and discuss its advantages and possible extensions.

1997

pdf
Practical Experience with Grammar Sharing in Multilingual NLP
Michael Gamon | Carmen Lozano | Jessie Pinkham | Tom Reutter
From Research to Commercial Applications: Making NLP Work in Practice