Abstract
This paper describes the architecture of the American National Corpus and the design decisions we have made in order to make the corpus easy to use with a variety of existing tools with varying functionality, and to allow for layering multiple annotations over the data. The overall goal of the ANC project is to provide an open linguistic infrastructure for American English, consisting of as many self-generated or contributed annotations of the data as possible together with derived. The availability of a wide variety of annotations for the same data and in a common format should significantly simplify the processing required to extract annotations from different sources and enable use of the ANC and its annotations with off-the-shelf software.- Anthology ID:
- L06-1338
- Volume:
- Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
- Month:
- May
- Year:
- 2006
- Address:
- Genoa, Italy
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2006/pdf/560_pdf.pdf
- DOI:
- Cite (ACL):
- Nancy Ide and Keith Suderman. 2006. Integrating Linguistic Resources: The American National Corpus Model. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
- Cite (Informal):
- Integrating Linguistic Resources: The American National Corpus Model (Ide & Suderman, LREC 2006)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2006/pdf/560_pdf.pdf