Melanie Andresen


2020

pdf
Modeling Ambiguity with Many Annotators and Self-Assessments of Annotator Certainty
Melanie Andresen | Michael Vauth | Heike Zinsmeister
Proceedings of the 14th Linguistic Annotation Workshop

Most annotation efforts assume that annotators will agree on labels, if the annotation categories are well-defined and documented in annotation guidelines. However, this is not always true. For instance, content-related questions such as ‘Is this sentence about topic X?’ are unlikely to elicit the same answer from all annotators. Additional specifications in the guidelines are helpful to some extent, but can soon get overspecified by rules that cannot be justified by a research question. In this study, we model the semantic category ‘illness’ and its use in a gradual way. For this purpose, we (i) ask many annotators (30 votes per item, 960 items) for their opinion in a crowdsourcing experiment, (ii) ask annotators to indicate their certainty with respect to their annotation, and (iii) compare this across two different text types. We show that results of multiple annotations and average annotator certainty correlate, but many ambiguities can only be captured if several people contribute. The annotated data allow us to filter for sentences with high or low agreement and analyze causes of disagreement, thus getting a better understanding of people’s perception of illness—as an example of a semantic category—as well as of the content of our annotated texts.

2018

pdf bib
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
Agata Savary | Carlos Ramisch | Jena D. Hwang | Nathan Schneider | Melanie Andresen | Sameer Pradhan | Miriam R. L. Petruck
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

2017

pdf
Approximating Style by N-gram-based Annotation
Melanie Andresen | Heike Zinsmeister
Proceedings of the Workshop on Stylistic Variation

The concept of style is much debated in theoretical as well as empirical terms. From an empirical perspective, the key question is how to operationalize style and thus make it accessible for annotation and quantification. In authorship attribution, many different approaches have successfully resolved this issue at the cost of linguistic interpretability: The resulting algorithms may be able to distinguish one language variety from the other, but do not give us much information on their distinctive linguistic properties. We approach the issue of interpreting stylistic features by extracting linear and syntactic n-grams that are distinctive for a language variety. We present a study that exemplifies this process by a comparison of the German academic languages of linguistics and literary studies. Overall, our findings show that distinctive n-grams can be related to linguistic categories. The results suggest that the style of German literary studies is characterized by nominal structures and the style of linguistics by verbal ones.

pdf
The Benefit of Syntactic vs. Linear N-grams for Linguistic Description
Melanie Andresen | Heike Zinsmeister
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)