Abstract
This paper presents the TuÌbingen Baumbank des Deutschen Diachron (TuÌBa-D/DC), a linguistically annotated corpus of selected diachronic materials from the German Gutenberg Project. It was automatically annotated by a suite of NLP tools integrated into WebLicht, the linguistic chaining tool used in CLARIN-D. The annotation quality has been evaluated manually for a subcorpus ranging from Middle High German to Modern High German. The integration of the TuÌBa-D/DC into the CLARIN-D infrastructure includes metadata provision and harvesting as well as sustainable data storage in the TuÌbingen CLARIN-D center. The paper further provides an overview of the possibilities of accessing the TuÌBa-D/DC data. Methods for full-text search of the metadata and object data and for annotation-based search of the object data are described in detail. The WebLicht Service Oriented Architecture is used as an integrated environment for annotation based search of the TuÌBa-D/DC. WebLicht thus not only serves as the annotation platform for the TuÌBa-D/DC, but also as a generic user interface for accessing and visualizing it.- Anthology ID:
- L12-1033
- Volume:
- Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
- Month:
- May
- Year:
- 2012
- Address:
- Istanbul, Turkey
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 1622–1627
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/166_Paper.pdf
- DOI:
- Cite (ACL):
- Erhard Hinrichs and Thomas Zastrow. 2012. Automatic Annotation and Manual Evaluation of the Diachronic German Corpus TüBa-D/DC. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1622–1627, Istanbul, Turkey. European Language Resources Association (ELRA).
- Cite (Informal):
- Automatic Annotation and Manual Evaluation of the Diachronic German Corpus TüBa-D/DC (Hinrichs & Zastrow, LREC 2012)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/166_Paper.pdf