Marco Spruit


2022

pdf
Looking from the Inside: How Children Render Character’s Perspectives in Freely Told Fantasy Stories
Max van Duijn | Bram van Dijk | Marco Spruit
Proceedings of the 4th Workshop of Narrative Understanding (WNU2022)

Story characters not only perform actions, they typically also perceive, feel, think, and communicate. Here we are interested in how children render characters’ perspectives when freely telling a fantasy story. Drawing on a sample of 150 narratives elicited from Dutch children aged 4-12, we provide an inventory of 750 instances of character-perspective representation (CPR), distinguishing fourteen different types. Firstly, we observe that character perspectives are ubiquitous in freely told children’s stories and take more varied forms than traditional frameworks can accommodate. Secondly, we discuss variation in the use of different types of CPR across age groups, finding that character perspectives are being fleshed out in more advanced and diverse ways as children grow older. Thirdly, we explore whether such variation can be meaningfully linked to automatically extracted linguistic features, thereby probing the potential for using automated tools from NLP to extract and classify character perspectives in children’s stories.

pdf
UU-Tax at SemEval-2022 Task 3: Improving the generalizability of language models for taxonomy classification through data augmentation
Injy Sarhan | Pablo Mosteiro | Marco Spruit
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper presents our strategy to address the SemEval-2022 Task 3 PreTENS: Presupposed Taxonomies Evaluating Neural Network Semantics. The goal of the task is to identify if a sentence is deemed acceptable or not, depending on the taxonomic relationship that holds between a noun pair contained in the sentence. For sub-task 1—binary classification—we propose an effective way to enhance the robustness and the generalizability of language models for better classification on this downstream task. We design a two-stage fine-tuning procedure on the ELECTRA language model using data augmentation techniques. Rigorous experiments are carried out using multi-task learning and data-enriched fine-tuning. Experimental results demonstrate that our proposed model, UU-Tax, is indeed able to generalize well for our downstream task. For sub-task 2 —regression—we propose a simple classifier that trains on features obtained from Universal Sentence Encoder (USE). In addition to describing the submitted systems, we discuss other experiments that employ pre-trained language models and data augmentation techniques. For both sub-tasks, we perform error analysis to further understand the behaviour of the proposed models. We achieved a global F1$Binary$ score of 91.25% in sub-task 1 and a rho score of 0.221 in sub-task 2.

2019

pdf
UU_TAILS at MEDIQA 2019: Learning Textual Entailment in the Medical Domain
Noha Tawfik | Marco Spruit
Proceedings of the 18th BioNLP Workshop and Shared Task

This article describes the participation of the UU_TAILS team in the 2019 MEDIQA challenge intended to improve domain-specific models in medical and clinical NLP. The challenge consists of 3 tasks: medical language inference (NLI), recognizing textual entailment (RQE) and question answering (QA). Our team participated in tasks 1 and 2 and our best runs achieved a performance accuracy of 0.852 and 0.584 respectively for the test sets. The models proposed for task 1 relied on BERT embeddings and different ensemble techniques. For the RQE task, we trained a traditional multilayer perceptron network based on embeddings generated by the universal sentence encoder.