Abstract
We used four Part-of-Speech taggers, which are available for research purposes and were originally trained on text to tag a corpus of transcribed multiparty spoken dialogues. The assigned tags were then manually corrected. The correction was first used to evaluate the four taggers, then to retrain them. Despite limited resources in time, money and annotators we reached results comparable to those reported for the taggers on text. Based on our experience we present guidelines to produce reliably POS tagged corpora of new domains.- Anthology ID:
- L06-1201
- Volume:
- Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
- Month:
- May
- Year:
- 2006
- Address:
- Genoa, Italy
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2006/pdf/345_pdf.pdf
- DOI:
- Cite (ACL):
- Margot Mieskes and Michael Strube. 2006. Part-of-Speech Tagging of Transcribed Speech. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
- Cite (Informal):
- Part-of-Speech Tagging of Transcribed Speech (Mieskes & Strube, LREC 2006)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2006/pdf/345_pdf.pdf