Tuomo Hiippala
2026
Building Multimodal Corpora Using Microtask Pipelines and Local Annotators
Helmiina Hotti | Raul Vazquez | Anna-Kaisa Jokipohja | Timo Kalliokoski | Henna Paakki | Rosa Suviranta | Tuomo Hiippala
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Helmiina Hotti | Raul Vazquez | Anna-Kaisa Jokipohja | Timo Kalliokoski | Henna Paakki | Rosa Suviranta | Tuomo Hiippala
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Multimodality, or how human communication and interaction combine multiple forms of expression, is studied across diverse fields of research. Many of these fields have underlined the need for large, richly annotated multimodal corpora to support empirical research. While language resources are increasingly annotated using microtask crowdsourcing, multimodal corpora remain largely reliant on expert annotators, which creates a bottleneck for scalability and broad applicability. This paper presents a novel hybrid approach to multimodal corpus annotation, leveraging the efficiency of microtask pipelines while preserving theoretical rigour. Our approach decomposes the annotation process into sequences of simple, well-instructed tasks, which are then performed by locally recruited non-expert annotators. We demonstrate the feasibility of this approach by presenting a pipeline for annotating the multimodal structure of school textbooks.
2022
Developing a tool for fair and reproducible use of paid crowdsourcing in the digital humanities
Tuomo Hiippala | Helmiina Hotti | Rosa Suviranta
Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Tuomo Hiippala | Helmiina Hotti | Rosa Suviranta
Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
This system demonstration paper describes ongoing work on a tool for fair and reproducible use of paid crowdsourcing in the digital humanities. Paid crowdsourcing is widely used in natural language processing and computer vision, but has been rarely applied in the digital humanities due to ethical concerns. We discuss concerns associated with paid crowdsourcing and describe how we seek to mitigate them in designing the tool and crowdsourcing pipelines. We demonstrate how the tool may be used to create annotations for diagrams, a complex mode of expression whose description requires human input.
2021
Applied Language Technology: NLP for the Humanities
Tuomo Hiippala
Proceedings of the Fifth Workshop on Teaching NLP
Tuomo Hiippala
Proceedings of the Fifth Workshop on Teaching NLP
This contribution describes a two-course module that seeks to provide humanities majors with a basic understanding of language technology and its applications using Python. The learning materials consist of interactive Jupyter Notebooks and accompanying YouTube videos, which are openly available with a Creative Commons licence.
2018
Enhancing the AI2 Diagrams Dataset Using Rhetorical Structure Theory
Tuomo Hiippala | Serafina Orekhova
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Tuomo Hiippala | Serafina Orekhova
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)