Scott Miller
2022
DEGREE: A Data-Efficient Generation-Based Event Extraction Model
I-Hung Hsu | Kuan-Hao Huang | Elizabeth Boschee | Scott Miller | Prem Natarajan | Kai-Wei Chang | Nanyun Peng
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
I-Hung Hsu | Kuan-Hao Huang | Elizabeth Boschee | Scott Miller | Prem Natarajan | Kai-Wei Chang | Nanyun Peng
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Event extraction requires high-quality expert human annotations, which are usually expensive. Therefore, learning a data-efficient event extraction model that can be trained with only a few labeled examples has become a crucial challenge. In this paper, we focus on low-resource end-to-end event extraction and propose DEGREE, a data-efficient model that formulates event extraction as a conditional generation problem. Given a passage and a manually designed prompt, DEGREE learns to summarize the events mentioned in the passage into a natural sentence that follows a predefined pattern. The final event predictions are then extracted from the generated sentence with a deterministic algorithm. DEGREE has three advantages to learn well with less training data. First, our designed prompts provide semantic guidance for DEGREE to leverage DEGREE and thus better capture the event arguments. Moreover, DEGREE is capable of using additional weakly-supervised information, such as the description of events encoded in the prompts. Finally, DEGREE learns triggers and arguments jointly in an end-to-end manner, which encourages the model to better utilize the shared knowledge and dependencies among them. Our experimental results demonstrate the strong performance of DEGREE for low-resource event extraction.
2020
SEARCHER: Shared Embedding Architecture for Effective Retrieval
Joel Barry | Elizabeth Boschee | Marjorie Freedman | Scott Miller
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)
Joel Barry | Elizabeth Boschee | Marjorie Freedman | Scott Miller
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)
We describe an approach to cross lingual information retrieval that does not rely on explicit translation of either document or query terms. Instead, both queries and documents are mapped into a shared embedding space where retrieval is performed. We discuss potential advantages of the approach in handling polysemy and synonymy. We present a method for training the model, and give details of the model implementation. We present experimental results for two cases: Somali-English and Bulgarian-English CLIR.
2019
The Challenges of Optimizing Machine Translation for Low Resource Cross-Language Information Retrieval
Constantine Lignos | Daniel Cohen | Yen-Chieh Lien | Pratik Mehta | W. Bruce Croft | Scott Miller
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Constantine Lignos | Daniel Cohen | Yen-Chieh Lien | Pratik Mehta | W. Bruce Croft | Scott Miller
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
When performing cross-language information retrieval (CLIR) for lower-resourced languages, a common approach is to retrieve over the output of machine translation (MT). However, there is no established guidance on how to optimize the resulting MT-IR system. In this paper, we examine the relationship between the performance of MT systems and both neural and term frequency-based IR models to identify how CLIR performance can be best predicted from MT quality. We explore performance at varying amounts of MT training data, byte pair encoding (BPE) merge operations, and across two IR collections and retrieval models. We find that the choice of IR collection can substantially affect the predictive power of MT tuning decisions and evaluation, potentially introducing dissociations between MT-only and overall CLIR performance.
Cross-lingual Joint Entity and Word Embedding to Improve Entity Linking and Parallel Sentence Mining
Xiaoman Pan | Thamme Gowda | Heng Ji | Jonathan May | Scott Miller
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)
Xiaoman Pan | Thamme Gowda | Heng Ji | Jonathan May | Scott Miller
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)
Entities, which refer to distinct objects in the real world, can be viewed as language universals and used as effective signals to generate less ambiguous semantic representations and align multiple languages. We propose a novel method, CLEW, to generate cross-lingual data that is a mix of entities and contextual words based on Wikipedia. We replace each anchor link in the source language with its corresponding entity title in the target language if it exists, or in the source language otherwise. A cross-lingual joint entity and word embedding learned from this kind of data not only can disambiguate linkable entities but can also effectively represent unlinkable entities. Because this multilingual common space directly relates the semantics of contextual words in the source language to that of entities in the target language, we leverage it for unsupervised cross-lingual entity linking. Experimental results show that CLEW significantly advances the state-of-the-art: up to 3.1% absolute F-score gain for unsupervised cross-lingual entity linking. Moreover, it provides reliable alignment on both the word/entity level and the sentence level, and thus we use it to mine parallel sentences for all (302, 2) language pairs in Wikipedia.
SARAL: A Low-Resource Cross-Lingual Domain-Focused Information Retrieval System for Effective Rapid Document Triage
Elizabeth Boschee | Joel Barry | Jayadev Billa | Marjorie Freedman | Thamme Gowda | Constantine Lignos | Chester Palen-Michel | Michael Pust | Banriskhem Kayang Khonglah | Srikanth Madikeri | Jonathan May | Scott Miller
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations
Elizabeth Boschee | Joel Barry | Jayadev Billa | Marjorie Freedman | Thamme Gowda | Constantine Lignos | Chester Palen-Michel | Michael Pust | Banriskhem Kayang Khonglah | Srikanth Madikeri | Jonathan May | Scott Miller
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations
With the increasing democratization of electronic media, vast information resources are available in less-frequently-taught languages such as Swahili or Somali. That information, which may be crucially important and not available elsewhere, can be difficult for monolingual English speakers to effectively access. In this paper we present an end-to-end cross-lingual information retrieval (CLIR) and summarization system for low-resource languages that 1) enables English speakers to search foreign language repositories of text and audio using English queries, 2) summarizes the retrieved documents in English with respect to a particular information need, and 3) provides complete transcriptions and translations as needed. The SARAL system achieved the top end-to-end performance in the most recent IARPA MATERIAL CLIR+summarization evaluations. Our demonstration system provides end-to-end open query retrieval and summarization capability, and presents the original source text or audio, speech transcription, and machine translation, for two low resource languages.
2017
CADET: Computer Assisted Discovery Extraction and Translation
Benjamin Van Durme | Tom Lippincott | Kevin Duh | Deana Burchfield | Adam Poliak | Cash Costello | Tim Finin | Scott Miller | James Mayfield | Philipp Koehn | Craig Harman | Dawn Lawrie | Chandler May | Max Thomas | Annabelle Carrell | Julianne Chaloux | Tongfei Chen | Alex Comerford | Mark Dredze | Benjamin Glass | Shudong Hao | Patrick Martin | Pushpendre Rastogi | Rashmi Sankepally | Travis Wolfe | Ying-Ying Tran | Ted Zhang
Proceedings of the IJCNLP 2017, System Demonstrations
Benjamin Van Durme | Tom Lippincott | Kevin Duh | Deana Burchfield | Adam Poliak | Cash Costello | Tim Finin | Scott Miller | James Mayfield | Philipp Koehn | Craig Harman | Dawn Lawrie | Chandler May | Max Thomas | Annabelle Carrell | Julianne Chaloux | Tongfei Chen | Alex Comerford | Mark Dredze | Benjamin Glass | Shudong Hao | Patrick Martin | Pushpendre Rastogi | Rashmi Sankepally | Travis Wolfe | Ying-Ying Tran | Ted Zhang
Proceedings of the IJCNLP 2017, System Demonstrations
Computer Assisted Discovery Extraction and Translation (CADET) is a workbench for helping knowledge workers find, label, and translate documents of interest. It combines a multitude of analytics together with a flexible environment for customizing the workflow for different users. This open-source framework allows for easy development of new research prototypes using a micro-service architecture based atop Docker and Apache Thrift.
2012
Modality and Negation in SIMT Use of Modality and Negation in Semantically-Informed Syntactic MT
Kathryn Baker | Michael Bloodgood | Bonnie J. Dorr | Chris Callison-Burch | Nathaniel W. Filardo | Christine Piatko | Lori Levin | Scott Miller
Computational Linguistics, Volume 38, Issue 2 - June 2012
Kathryn Baker | Michael Bloodgood | Bonnie J. Dorr | Chris Callison-Burch | Nathaniel W. Filardo | Christine Piatko | Lori Levin | Scott Miller
Computational Linguistics, Volume 38, Issue 2 - June 2012
2010
Semantically-Informed Syntactic Machine Translation: A Tree-Grafting Approach
Kathryn Baker | Michael Bloodgood | Chris Callison-Burch | Bonnie Dorr | Nathaniel Filardo | Lori Levin | Scott Miller | Christine Piatko
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers
Kathryn Baker | Michael Bloodgood | Chris Callison-Burch | Bonnie Dorr | Nathaniel Filardo | Lori Levin | Scott Miller | Christine Piatko
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers
We describe a unified and coherent syntactic framework for supporting a semantically-informed syntactic approach to statistical machine translation. Semantically enriched syntactic tags assigned to the target-language training texts improved translation quality. The resulting system significantly outperformed a linguistically naive baseline model (Hiero), and reached the highest scores yet reported on the NIST 2009 Urdu-English translation task. This finding supports the hypothesis (posed by many researchers in the MT community, e.g., in DARPA GALE) that both syntactic and semantic information are critical for improving translation quality—and further demonstrates that large gains can be achieved for low-resource languages with different word order than English.
2004
Name Tagging with Word Clusters and Discriminative Training
Scott Miller | Jethran Guinness | Alex Zamanian
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004
Scott Miller | Jethran Guinness | Alex Zamanian
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004
2001
Experiments in Multi-Modal Automatic Content Extraction
Lance Ramshaw | Elizabeth Boschee | Sergey Bratus | Scott Miller | Rebecca Stone | Ralph Weischedel | Alex Zamanian
Proceedings of the First International Conference on Human Language Technology Research
Lance Ramshaw | Elizabeth Boschee | Sergey Bratus | Scott Miller | Rebecca Stone | Ralph Weischedel | Alex Zamanian
Proceedings of the First International Conference on Human Language Technology Research
FactBrowser Demonstration
Scott Miller | Sergey Bratus | Lance Ramshaw | Ralph Weischedel | Alex Zamanian
Proceedings of the First International Conference on Human Language Technology Research
Scott Miller | Sergey Bratus | Lance Ramshaw | Ralph Weischedel | Alex Zamanian
Proceedings of the First International Conference on Human Language Technology Research
2000
A Novel Use of Statistical Parsing to Extract Information from Text
Scott Miller | Heidi Fox | Lance Ramshaw | Ralph Weischedel
1st Meeting of the North American Chapter of the Association for Computational Linguistics
Scott Miller | Heidi Fox | Lance Ramshaw | Ralph Weischedel
1st Meeting of the North American Chapter of the Association for Computational Linguistics
1998
BBN: Description of the SIFT System as Used for MUC-7
Scott Miller | Michael Crystal | Heidi Fox | Lance Ramshaw | Richard Schwartz | Rebecca Stone | Ralph Weischedel | The Annotation Group
Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998
Scott Miller | Michael Crystal | Heidi Fox | Lance Ramshaw | Richard Schwartz | Rebecca Stone | Ralph Weischedel | The Annotation Group
Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998
Semantic Tagging using a Probabilistic Context Free Grammar
Michael Collins | Scott Miller
Sixth Workshop on Very Large Corpora
Michael Collins | Scott Miller
Sixth Workshop on Very Large Corpora
Algorithms That Learn to Extract Information BBN: TIPSTER Phase III
Scott Miller | Michael Crystal | Heidi Fox | Lance Ramshaw | Richard Schwartz | Rebecca Stone | Ralph Weischedel
TIPSTER TEXT PROGRAM PHASE III: Proceedings of a Workshop held at Baltimore, Maryland, October 13-15, 1998
Scott Miller | Michael Crystal | Heidi Fox | Lance Ramshaw | Richard Schwartz | Rebecca Stone | Ralph Weischedel
TIPSTER TEXT PROGRAM PHASE III: Proceedings of a Workshop held at Baltimore, Maryland, October 13-15, 1998
1997
Nymble: a High-Performance Learning Name-finder
Daniel M. Bikel | Scott Miller | Richard Schwartz | Ralph Weischedel
Fifth Conference on Applied Natural Language Processing
Daniel M. Bikel | Scott Miller | Richard Schwartz | Ralph Weischedel
Fifth Conference on Applied Natural Language Processing
1996
A Fully Statistical Approach to Natural Language Interfaces
Scott Miller | David Stallard | Robert Bobrow | Richard Schwartz
34th Annual Meeting of the Association for Computational Linguistics
Scott Miller | David Stallard | Robert Bobrow | Richard Schwartz
34th Annual Meeting of the Association for Computational Linguistics
1994
Automatic Grammar Acquisition
Scott Miller | Heidi J. Fox
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994
Scott Miller | Heidi J. Fox
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994
Statistical Language Processing Using Hidden Understanding Models
Scott Miller | Richard Schwartz | Robert Bobrow | Robert Ingria
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994
Scott Miller | Richard Schwartz | Robert Bobrow | Robert Ingria
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994
Hidden Understanding Models of Natural Language
Scott Miller | Robert Bobrow | Robert Ingria | Richard Schwartz
32nd Annual Meeting of the Association for Computational Linguistics
Scott Miller | Robert Bobrow | Robert Ingria | Richard Schwartz
32nd Annual Meeting of the Association for Computational Linguistics
1993
Example-Based Correction of Word Segmentation and Part of Speech Labelling
Tomoyoshi Matsukawa | Scott Miller | Ralph Weischedel
Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993
Tomoyoshi Matsukawa | Scott Miller | Ralph Weischedel
Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993
BBN: Description of the PLUM System as Used for MUC-5
Ralph Weischedel | Damaris Ayuso | Sean Boisen | Heidi Fox | Robert Ingria | Tomoyoshi Matsukawa | Constantine Papageorgiou | Dawn MacLaughlin | Masaichiro Kitagawa | Tsutomu Sakai | June Abe | Hiroto Hosiho | Yoichi Miyamoto | Scott Miller
Fifth Message Understanding Conference (MUC-5): Proceedings of a Conference Held in Baltimore, Maryland, August 25-27, 1993
Ralph Weischedel | Damaris Ayuso | Sean Boisen | Heidi Fox | Robert Ingria | Tomoyoshi Matsukawa | Constantine Papageorgiou | Dawn MacLaughlin | Masaichiro Kitagawa | Tsutomu Sakai | June Abe | Hiroto Hosiho | Yoichi Miyamoto | Scott Miller
Fifth Message Understanding Conference (MUC-5): Proceedings of a Conference Held in Baltimore, Maryland, August 25-27, 1993
BBN’s PLUM Probabilistic Language Understanding System
Ralph Weischedel | Damaris Ayuso | Heidi Fox | Tomoyoshi Matsukawa | Constantine Papageorgiou | Dawn MacLaughlin | Masaichiro Kitagawa | Tsutomu Sakai | June Abe | Hiroto Hosiho | Yoichi Miyamoto | Scott Miller
TIPSTER TEXT PROGRAM: PHASE I: Proceedings of a Workshop held at Fredricksburg, Virginia, September 19-23, 1993
Ralph Weischedel | Damaris Ayuso | Heidi Fox | Tomoyoshi Matsukawa | Constantine Papageorgiou | Dawn MacLaughlin | Masaichiro Kitagawa | Tsutomu Sakai | June Abe | Hiroto Hosiho | Yoichi Miyamoto | Scott Miller
TIPSTER TEXT PROGRAM: PHASE I: Proceedings of a Workshop held at Fredricksburg, Virginia, September 19-23, 1993
Search
Fix author
Co-authors
- Ralph Weischedel 9
- Heidi Fox 6
- Richard Schwartz 6
- Lance Ramshaw 5
- Elizabeth Boschee 4
- Robert Bobrow 3
- Robert Ingria 3
- Tomoyoshi Matsukawa 3
- Rebecca Stone 3
- Alex Zamanian 3
- June Abe 2
- Damaris Ayuso 2
- Kathryn Baker 2
- Joel Barry 2
- Michael Bloodgood 2
- Sergey Bratus 2
- Chris Callison-Burch 2
- Michael Crystal 2
- Bonnie Dorr 2
- Marjorie Freedman 2
- Thamme Gowda 2
- Hiroto Hosiho 2
- Masaichiro Kitagawa 2
- Lori Levin 2
- Constantine Lignos 2
- Dawn MacLaughlin 2
- Jonathan May 2
- Yoichi Miyamoto 2
- Constantine Papageorgiou 2
- Christine Piatko 2
- Tsutomu Sakai 2
- Daniel M. Bikel 1
- Jayadev Billa 1
- Sean Boisen 1
- Deana Burchfield 1
- Annabelle Carrell 1
- Julianne Chaloux 1
- Kai-Wei Chang 1
- Tongfei Chen 1
- Daniel Cohen 1
- Michael Collins 1
- Alex Comerford 1
- Cash Costello 1
- W. Bruce Croft 1
- Mark Dredze 1
- Kevin Duh 1
- Benjamin Van Durme 1
- Nathaniel Filardo 1
- Nathaniel W. Filardo 1
- Tim Finin 1
- Benjamin Glass 1
- Jethran Guinness 1
- Shudong Hao 1
- Craig Harman 1
- I-Hung Hsu 1
- Kuan - Hao Huang 1
- Heng Ji 1
- Banriskhem Kayang Khonglah 1
- Philipp Koehn 1
- Dawn Lawrie 1
- Yen-Chieh Lien 1
- Tom Lippincott 1
- Srikanth Madikeri 1
- M. Patrick Martin 1
- Chandler May 1
- James Mayfield 1
- Pratik Mehta 1
- Prem Natarajan 1
- Chester Palen-Michel 1
- Xiaoman Pan 1
- Nanyun Peng 1
- Adam Poliak 1
- Michael Pust 1
- Pushpendre Rastogi 1
- Rashmi Sankepally 1
- David Stallard 1
- The Annotation Group 1
- Max Thomas 1
- Ying-Ying Tran 1
- Travis Wolfe 1
- Ted Zhang 1