Retrieving Annotated Corpora for Corpus Annotation
Kyôsuke Yoshida, Taiichi Hashimoto, Takenobu Tokunaga, Hozumi Tanaka
Abstract
This paper introduces a tool \Bonsai which supports human in annotating corpora with morphosyntactic information, and in retrieving syntactic structures stored in the database. Integrating annotation and retrieval enables users to annotate a new instance while looking back at the already annotated sentences which share the similar morphosyntactic structure. We focus on the retrieval part of the system, and describe a method to decompose a large input query into smaller ones in order to gain retrieval efficiency. The proposed method is evaluated with the Penn Treebank corpus, showing significant improvements.- Anthology ID:
- L04-1233
- Volume:
- Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
- Month:
- May
- Year:
- 2004
- Address:
- Lisbon, Portugal
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2004/pdf/403.pdf
- DOI:
- Cite (ACL):
- Kyôsuke Yoshida, Taiichi Hashimoto, Takenobu Tokunaga, and Hozumi Tanaka. 2004. Retrieving Annotated Corpora for Corpus Annotation. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), Lisbon, Portugal. European Language Resources Association (ELRA).
- Cite (Informal):
- Retrieving Annotated Corpora for Corpus Annotation (Yoshida et al., LREC 2004)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2004/pdf/403.pdf