Teresa Bürkle
2023
MuLMS-AZ: An Argumentative Zoning Dataset for the Materials Science Domain
Timo Schrader
|
Teresa Bürkle
|
Sophie Henning
|
Sherry Tan
|
Matteo Finco
|
Stefan Grünewald
|
Maira Indrikova
|
Felix Hildebrand
|
Annemarie Friedrich
Proceedings of the 4th Workshop on Computational Approaches to Discourse (CODI 2023)
Scientific publications follow conventionalized rhetorical structures. Classifying the Argumentative Zone (AZ), e.g., identifying whether a sentence states a Motivation, a Result or Background information, has been proposed to improve processing of scholarly documents. In this work, we adapt and extend this idea to the domain of materials science research. We present and release a new dataset of 50 manually annotated research articles. The dataset spans seven sub-topics and is annotated with a materials-science focused multi-label annotation scheme for AZ. We detail corpus statistics and demonstrate high inter-annotator agreement. Our computational experiments show that using domain-specific pre-trained transformer-based text encoders is key to high classification performance. We also find that AZ categories from existing datasets in other domains are transferable to varying degrees.
2021
A Corpus Study of Creating Rule-Based Enhanced Universal Dependencies for German
Teresa Bürkle
|
Stefan Grünewald
|
Annemarie Friedrich
Proceedings of the Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop
In this paper, we present a first attempt at enriching German Universal Dependencies (UD) treebanks with enhanced dependencies. Similarly to the converter for English (Schuster and Manning, 2016), we develop a rule-based system for deriving enhanced dependencies from the basic layer, covering three linguistic phenomena: relative clauses, coordination, and raising/control. For quality control, we manually correct or validate a set of 196 sentences, finding that around 90% of added relations are correct. Our data analysis reveals that difficulties arise mainly due to inconsistencies in the basic layer annotations. We show that the English system is in general applicable to German data, but that adapting to the particularities of the German treebanks and language increases precision and recall by up to 10%. Comparing the application of our converter on gold standard dependencies vs. automatic parses, we find that F1 drops by around 10% in the latter setting due to error propagation. Finally, an enhanced UD parser trained on a converted treebank performs poorly when evaluated against our annotations, indicating that more work remains to be done to create gold standard enhanced German treebanks.
Search
Co-authors
- Stefan Grünewald 2
- Annemarie Friedrich 2
- Timo Schrader 1
- Sophie Henning 1
- Sherry Tan 1
- show all...