Henna Paakki
2026
Building Multimodal Corpora Using Microtask Pipelines and Local Annotators
Helmiina Hotti | Raul Vazquez | Anna-Kaisa Jokipohja | Timo Kalliokoski | Henna Paakki | Rosa Suviranta | Tuomo Hiippala
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Helmiina Hotti | Raul Vazquez | Anna-Kaisa Jokipohja | Timo Kalliokoski | Henna Paakki | Rosa Suviranta | Tuomo Hiippala
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Multimodality, or how human communication and interaction combine multiple forms of expression, is studied across diverse fields of research. Many of these fields have underlined the need for large, richly annotated multimodal corpora to support empirical research. While language resources are increasingly annotated using microtask crowdsourcing, multimodal corpora remain largely reliant on expert annotators, which creates a bottleneck for scalability and broad applicability. This paper presents a novel hybrid approach to multimodal corpus annotation, leveraging the efficiency of microtask pipelines while preserving theoretical rigour. Our approach decomposes the annotation process into sequences of simple, well-instructed tasks, which are then performed by locally recruited non-expert annotators. We demonstrate the feasibility of this approach by presenting a pipeline for annotating the multimodal structure of school textbooks.
2024
Ensemble-based Multilingual Euphemism Detection: a Behavior-Guided Approach
Fedor Vitiugin | Henna Paakki
Proceedings of the 4th Workshop on Figurative Language Processing (FigLang 2024)
Fedor Vitiugin | Henna Paakki
Proceedings of the 4th Workshop on Figurative Language Processing (FigLang 2024)
This paper describes the system submitted by our team to the Multilingual Euphemism Detection Shared Task for the Fourth Workshop on Figurative Language Processing (FigLang 2024). We propose a novel model for multilingual euphemism detection, combining contextual and behavior-related features. The system classifies texts that potentially contain euphemistic terms with an ensemble classifier based on outputs from behavior-related fine-tuned models. Our results show that, for this kind of task, our model outperforms baselines and state-of-the-art euphemism detection methods. As for the leader-board, our classification model achieved a macro averaged F1 score of [anonymized], reaching the [anonymized] place.