A Modality Lexicon and its use in Automatic Tagging
Kathryn Baker, Michael Bloodgood, Bonnie Dorr, Nathaniel W. Filardo, Lori Levin, Christine Piatko
Abstract
This paper describes our resource-building results for an eight-week JHU Human Language Technology Center of Excellence Summer Camp for Applied Language Exploration (SCALE-2009) on Semantically-Informed Machine Translation. Specifically, we describe the construction of a modality annotation scheme, a modality lexicon, and two automated modality taggers that were built using the lexicon and annotation scheme. Our annotation scheme is based on identifying three components of modality: a trigger, a target and a holder. We describe how our modality lexicon was produced semi-automatically, expanding from an initial hand-selected list of modality trigger words and phrases. The resulting expanded modality lexicon is being made publicly available. We demonstrate that one tagger―a structure-based tagger―results in precision around 86% (depending on genre) for tagging of a standard LDC data set. In a machine translation application, using the structure-based tagger to annotate English modalities on an English-Urdu training corpus improved the translation quality score for Urdu by 0.3 Bleu points in the face of sparse training data.- Anthology ID:
 - L10-1309
 - Volume:
 - Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
 - Month:
 - May
 - Year:
 - 2010
 - Address:
 - Valletta, Malta
 - Editors:
 - Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
 - Venue:
 - LREC
 - SIG:
 - Publisher:
 - European Language Resources Association (ELRA)
 - Note:
 - Pages:
 - Language:
 - URL:
 - http://www.lrec-conf.org/proceedings/lrec2010/pdf/446_Paper.pdf
 - DOI:
 - Cite (ACL):
 - Kathryn Baker, Michael Bloodgood, Bonnie Dorr, Nathaniel W. Filardo, Lori Levin, and Christine Piatko. 2010. A Modality Lexicon and its use in Automatic Tagging. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
 - Cite (Informal):
 - A Modality Lexicon and its use in Automatic Tagging (Baker et al., LREC 2010)
 - PDF:
 - http://www.lrec-conf.org/proceedings/lrec2010/pdf/446_Paper.pdf