Abstract
Based on the idea that local contexts predict the same basic category across a language, we develop a simple method for comparing tagsets across corpora. The principle differences between tagsets are evidenced by variation in categories in one corpus in the same contexts where another corpus exhibits only a single tag. Such mismatches highlight differences in the definitions of tags which are crucial when porting technology from one annotation scheme to another.- Anthology ID:
- L08-1225
- Volume:
- Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
- Month:
- May
- Year:
- 2008
- Address:
- Marrakech, Morocco
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2008/pdf/210_paper.pdf
- DOI:
- Cite (ACL):
- Markus Dickinson and Charles Jochim. 2008. A Simple Method for Tagset Comparision. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
- Cite (Informal):
- A Simple Method for Tagset Comparision (Dickinson & Jochim, LREC 2008)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2008/pdf/210_paper.pdf