Reflection about a learning process is beneficial to students in higher education (Bub-nys, 2019). The importance of machine understanding of reflective texts grows as applications supporting students become more widespread. Nevertheless, due to the sensitive content, there is no public corpus available yet for the classification of text reflectiveness. We provide the first open-access corpus of reflective student essays in German. We collected essays from three different disciplines (Software Development, Ethics of Artificial Intelligence, and Teacher Training). We annotated the corpus at sentence level with binary reflective/non-reflective labels, using an iterative annotation process with linguistic and didactic specialists, mapping the reflective components found in the data to existing schemes and complementing them. We propose and evaluate linguistic features of reflectiveness and analyse their distribution within the resulted sentences according to their labels. Our contribution constitutes the first open-access corpus to help the community towards a unified approach for reflection detection.
We present a new corpus of tutorial dialogs on mathematical theorem proving that was collected in a Wizard-of-Oz setup. Our study is a follow up on a previous experiment conducted in a similar simulated environment. A major difference between the current and the previous experimental setup was that in this study we varied the presentation of the study-material with which the subjects were provided. One sub-group of the subjects was presented with a highly formalized presentation consisting mainly of formulas, while the other with a presentation mainly in natural language. Our goal was to obtain more data on the kind of mixed-language that is characteristic of informal mathematical discourse. We hypothesized that the language style of the subjects' interaction with the simulated system will reflect the style of presentation of the study-material. In the paper we briefly present the experimental setup, the corpus, and a preliminary quantitative result of the corpus analysis.