Kathrin Beck


2014

pdf
Adapting a part-of-speech tagset to non-standard text: The case of STTS
Heike Zinsmeister | Ulrich Heid | Kathrin Beck
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The Stuttgart-Tübingen TagSet (STTS) is a de-facto standard for the part-of-speech tagging of German texts. Since its first publication in 1995, STTS has been used in a variety of annotation projects, some of which have adapted the tagset slightly for their specific needs. Recently, the focus of many projects has shifted from the analysis of newspaper text to that of non-standard varieties such as user-generated content, historical texts, and learner language. These text types contain linguistic phenomena that are missing from or are only suboptimally covered by STTS; in a community effort, German NLP researchers have therefore proposed additions to and modifications of the tagset that will handle these phenomena more appropriately. In addition, they have discussed alternative ways of tag assignment in terms of bipartite tags (stem, token) for historical texts and tripartite tags (lexicon, morphology, distribution) for learner texts. In this article, we report on this ongoing activity, addressing methodological issues and discussing selected phenomena and their treatment in the tagset adaptation process.

2010

pdf
Chunking German: An Unsolved Problem
Sandra Kübler | Kathrin Beck | Erhard Hinrichs | Heike Telljohann
Proceedings of the Fourth Linguistic Annotation Workshop