Steve DeNeefe
2024
Domain adapted machine translation: What does catastrophic forgetting forget and why?
Danielle Saunders | Steve DeNeefe
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Danielle Saunders | Steve DeNeefe
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Neural Machine Translation (NMT) models can be specialized by domain adaptation, often involving fine-tuning on a dataset of interest. This process risks catastrophic forgetting: rapid loss of generic translation quality. Forgetting has been widely observed, with many mitigation methods proposed. However, the causes of forgetting and the relationship between forgetting and adaptation data are underexplored.This paper takes a novel approach to understanding catastrophic forgetting during NMT adaptation by investigating the impact of the data. We provide a first investigation of what is forgotten, and why. We examine the relationship between forgetting and the in-domain data, and show that the amount and type of forgetting is linked to that data’s target vocabulary coverage. Our findings pave the way toward better informed NMT domain adaptation.
2023
AbLit: A Resource for Analyzing and Generating Abridged Versions of English Literature
Melissa Roemmele | Kyle Shaffer | Katrina Olsen | Yiyi Wang | Steve DeNeefe
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Melissa Roemmele | Kyle Shaffer | Katrina Olsen | Yiyi Wang | Steve DeNeefe
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Creating an abridged version of a text involves shortening it while maintaining its linguistic qualities. In this paper, we examine this task from an NLP perspective for the first time. We present a new resource, AbLit, which is derived from abridged versions of English literature books. The dataset captures passage-level alignments between the original and abridged texts. We characterize the linguistic relations of these alignments, and create automated models to predict these relations as well as to generate abridgements for new texts. Our findings establish abridgement as a challenging task, motivating future resources and research. The dataset is available at github.com/roemmele/AbLit.
2021
AnswerQuest: A System for Generating Question-Answer Items from Multi-Paragraph Documents
Melissa Roemmele | Deep Sidhpura | Steve DeNeefe | Ling Tsou
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
Melissa Roemmele | Deep Sidhpura | Steve DeNeefe | Ling Tsou
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
One strategy for facilitating reading comprehension is to present information in a question-and-answer format. We demo a system that integrates the tasks of question answering (QA) and question generation (QG) in order to produce Q&A items that convey the content of multi-paragraph documents. We report some experiments for QA and QG that yield improvements on both tasks, and assess how they interact to produce a list of Q&A items for a text. The demo is accessible at qna.sdl.com.
2011
Two Easy Improvements to Lexical Weighting
David Chiang | Steve DeNeefe | Michael Pust
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
David Chiang | Steve DeNeefe | Michael Pust
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
2010
A Decoder for Probabilistic Synchronous Tree Insertion Grammars
Steve DeNeefe | Kevin Knight | Heiko Vogler
Proceedings of the 2010 Workshop on Applications of Tree Automata in Natural Language Processing
Steve DeNeefe | Kevin Knight | Heiko Vogler
Proceedings of the 2010 Workshop on Applications of Tree Automata in Natural Language Processing
2009
Synchronous Tree Adjoining Machine Translation
Steve DeNeefe | Kevin Knight
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
Steve DeNeefe | Kevin Knight
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
2008
Overcoming Vocabulary Sparsity in MT Using Lattices
Steve DeNeefe | Ulf Hermjakob | Kevin Knight
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers
Steve DeNeefe | Ulf Hermjakob | Kevin Knight
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers
Source languages with complex word-formation rules present a challenge for statistical machine translation (SMT). In this paper, we take on three facets of this challenge: (1) common stems are fragmented into many different forms in training data, (2) rare and unknown words are frequent in test data, and (3) spelling variation creates additional sparseness problems. We present a novel, lightweight technique for dealing with this fragmentation, based on bilingual data, and we also present a combination of linguistic and statistical techniques for dealing with rare and unknown words. Taking these techniques together, we demonstrate +1.3 and +1.6 BLEU increases on top of strong baselines for Arabic-English machine translation.
Decomposability of Translation Metrics for Improved Evaluation and Efficient Algorithms
David Chiang | Steve DeNeefe | Yee Seng Chan | Hwee Tou Ng
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing
David Chiang | Steve DeNeefe | Yee Seng Chan | Hwee Tou Ng
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing
2007
What Can Syntax-Based MT Learn from Phrase-Based MT?
Steve DeNeefe | Kevin Knight | Wei Wang | Daniel Marcu
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)
Steve DeNeefe | Kevin Knight | Wei Wang | Daniel Marcu
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)
2006
Scalable Inference and Training of Context-Rich Syntactic Translation Models
Michel Galley | Jonathan Graehl | Kevin Knight | Daniel Marcu | Steve DeNeefe | Wei Wang | Ignacio Thayer
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
Michel Galley | Jonathan Graehl | Kevin Knight | Daniel Marcu | Steve DeNeefe | Wei Wang | Ignacio Thayer
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics