Susanne Haaf


2018

pdf
Lightweight Grammatical Annotation in the TEI: New Perspectives
Piotr Bański | Susanne Haaf | Martin Mueller
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf
Corpus Analysis based on Structural Phenomena in Texts: Exploiting TEI Encoding for Linguistic Research
Susanne Haaf
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper poses the question, how linguistic corpus-based research may be enriched by the exploitation of conceptual text structures and layout as provided via TEI annotation. Examples for possible areas of research and usage scenarios are provided based on the German historical corpus of the Deutsches Textarchiv (DTA) project, which has been consistently tagged accordant to the TEI Guidelines, more specifically to the DTA ›Base Format‹ (DTABf). The paper shows that by including TEI-XML structuring in corpus-based analyses significances can be observed for different linguistic phenomena, as e.g. the development of conceptual text structures themselves, the syntactic embedding of terms in certain conceptual text structures, and phenomena of language change which become obvious via the layout of a text. The exemplary study carried out here shows some of the potential for the exploitation of TEI annotation for linguistic research, which might be kept in mind when making design decisions for new corpora as well when working with existing TEI corpora.