Abstract
As users become more accustomed to using their mobile devices to organize and schedule their lives, there is more of a demand for applications that can make that process easier. Automatic speech recognition technology has already been developed to enable essentially unlimited vocabulary in a mobile setting. Understanding the words that are spoken is the next challenge. In this paper, we describe efforts to develop a dataset and classifier to recognize named entities in speech. Using sets of both real and simulated data, in conjunction with a very large set of real named entities, we created a challenging corpus of training and test data. We use these data to develop a classifier to identify names and locations on a word-by-word basis. In this paper, we describe the process of creating the data and determining a set of features to use for named entity recognition. We report on our classification performance on these data, as well as point to future work in improving all aspects of the system.- Anthology ID:
- L10-1194
- Volume:
- Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
- Month:
- May
- Year:
- 2010
- Address:
- Valletta, Malta
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2010/pdf/280_Paper.pdf
- DOI:
- Cite (ACL):
- Joseph Polifroni, Imre Kiss, and Mark Adler. 2010. Bootstrapping Named Entity Extraction for the Creation of Mobile Services. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
- Cite (Informal):
- Bootstrapping Named Entity Extraction for the Creation of Mobile Services (Polifroni et al., LREC 2010)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2010/pdf/280_Paper.pdf