Abstract
The paper investigates the issue of portability of methods and results over treebanks in different languages and annotation formats. In particular, it addresses the problem of converting an Italian treebank, the Turin University Treebank (TUT), developed in dependency format, into the Penn Treebank format, in order to possibly exploit the tools and methods already developed and compare the adequacy of information encoding in the two formats. We describe the procedures for converting the two annotation formats and we present an experiment that evaluates some linguistic knowledge extracted from the two formats, namely sub-categorization frames.- Anthology ID:
- L06-1468
- Volume:
- Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
- Month:
- May
- Year:
- 2006
- Address:
- Genoa, Italy
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2006/pdf/750_pdf.pdf
- DOI:
- Cite (ACL):
- Cristina Bosco and Vincenzo Lombardo. 2006. Comparing linguistic information in treebank annotations. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
- Cite (Informal):
- Comparing linguistic information in treebank annotations (Bosco & Lombardo, LREC 2006)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2006/pdf/750_pdf.pdf