Stefan L. Frank
Also published as: Stefan Frank
2026
The MultiplEYE Text Corpus: Towards a Diverse and Ever-Expanding Multilingual Text Corpus
Ramunė Kasperė | Anna Bondar | Sergiu Nisioi | Maja Stegenwallner-Schütz | Hanne B. Søndergaard Knudsen | Ana Matić | Eva Pavlinušić Vilus | Dorota Klimek-Jankowska | Chiara Tschirner | Not Battesta Soliva | Deborah N. Jakobi | Cui Ding | Dima Abu Romi | Cengiz Acarturk | Matilda Agdler | Anton Marius Alexandru | Mohd Faizan Ansari | Annalisa Arcidiacono | Elizabete Ausma Velta Barisa | Ana Bautista | Lisa Beinborn | Yevgeni Berzak | Nedeljka Bjelanović | Anna Isabelle Bothmann | Jan Brasser | Caterina Cacioli | Anila Çepani | Ilze Ceple | Adelina Cerpja | Dalí Chirino | Jan Chromý | Alessandro Corona Mendozza | Iria de-Dios-Flores | Nazik Dinçtopal Deniz | Ana Došen | Kristian Elersič | Inmaculada Fajardo | Zigmunds Freibergs | Angelina Ganebnaya | Shan Gao | Jéssica Gomes | Annjo Klungervik Greenall | Alba Haveriku | Miao He | Anamaria Hodivoianu | Yu-Yin Hsu | Amanda Isaksen | Andreia Janeiro | Kristine Jensen de López | Aleksandar Jevremovic | Vojislav Jovanovic | Hanna Kędzierska | Nik Kharlamov | Sara Kosutar | Nelda Kote | Vanja Kovic | Izabela Krejtz | Thyra Krosness | Oleksandra Kuvshynova | Eilam Lavy | Ella Lion | Marta Łockiewicz | Kaidi Lõo | Paula Luegi | Mircea Mihai Marin | Clara Martin | Svitlana Matvieieva | Diane C. Mézière | Xavier Mínguez-López | Valeriia Modina | Jurgita Motiejūnienė | Marie-Luise Müller | Tolgonai Nasipbek kyzy | Jamal Abdul Nasir | Johanne S. K. Nedergård | Ayşegül Özkan | Patrizia Paggio | Marijan Palmović | Maria Christina Panagiotopoulou | Alberto Parola | Helena Pérez | Klaudia Petersen | Anja Podlesek | Eva Pospíšilová | Marta Praulina | Mikuláš Preininger | Loredana Pungă | Diego Rossini | Špela Rot | Habib Sani Yahaya | Irina A. Sekerina | Anne Gabija Skadina | Jordi Solé-Casals | Lonneke van der Plas | Saara M. Varjopuro | Spyridoula Varlokosta | João Veríssimo | Oskari Juhapekka Virtanen | Nemanja Vračar | Mila Vulchanova | Ahmad Mustapha Wali | Peizheng Wu | Nilgün Yücel | Stefan Frank | Nora Hollenstein | Lena Jäger
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Ramunė Kasperė | Anna Bondar | Sergiu Nisioi | Maja Stegenwallner-Schütz | Hanne B. Søndergaard Knudsen | Ana Matić | Eva Pavlinušić Vilus | Dorota Klimek-Jankowska | Chiara Tschirner | Not Battesta Soliva | Deborah N. Jakobi | Cui Ding | Dima Abu Romi | Cengiz Acarturk | Matilda Agdler | Anton Marius Alexandru | Mohd Faizan Ansari | Annalisa Arcidiacono | Elizabete Ausma Velta Barisa | Ana Bautista | Lisa Beinborn | Yevgeni Berzak | Nedeljka Bjelanović | Anna Isabelle Bothmann | Jan Brasser | Caterina Cacioli | Anila Çepani | Ilze Ceple | Adelina Cerpja | Dalí Chirino | Jan Chromý | Alessandro Corona Mendozza | Iria de-Dios-Flores | Nazik Dinçtopal Deniz | Ana Došen | Kristian Elersič | Inmaculada Fajardo | Zigmunds Freibergs | Angelina Ganebnaya | Shan Gao | Jéssica Gomes | Annjo Klungervik Greenall | Alba Haveriku | Miao He | Anamaria Hodivoianu | Yu-Yin Hsu | Amanda Isaksen | Andreia Janeiro | Kristine Jensen de López | Aleksandar Jevremovic | Vojislav Jovanovic | Hanna Kędzierska | Nik Kharlamov | Sara Kosutar | Nelda Kote | Vanja Kovic | Izabela Krejtz | Thyra Krosness | Oleksandra Kuvshynova | Eilam Lavy | Ella Lion | Marta Łockiewicz | Kaidi Lõo | Paula Luegi | Mircea Mihai Marin | Clara Martin | Svitlana Matvieieva | Diane C. Mézière | Xavier Mínguez-López | Valeriia Modina | Jurgita Motiejūnienė | Marie-Luise Müller | Tolgonai Nasipbek kyzy | Jamal Abdul Nasir | Johanne S. K. Nedergård | Ayşegül Özkan | Patrizia Paggio | Marijan Palmović | Maria Christina Panagiotopoulou | Alberto Parola | Helena Pérez | Klaudia Petersen | Anja Podlesek | Eva Pospíšilová | Marta Praulina | Mikuláš Preininger | Loredana Pungă | Diego Rossini | Špela Rot | Habib Sani Yahaya | Irina A. Sekerina | Anne Gabija Skadina | Jordi Solé-Casals | Lonneke van der Plas | Saara M. Varjopuro | Spyridoula Varlokosta | João Veríssimo | Oskari Juhapekka Virtanen | Nemanja Vračar | Mila Vulchanova | Ahmad Mustapha Wali | Peizheng Wu | Nilgün Yücel | Stefan Frank | Nora Hollenstein | Lena Jäger
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We present the MultiplEYE Text Corpus, a large-scale, document-level, multi-parallel resource designed to advance cross-linguistic research on reading and language processing. The corpus provides paragraph-level alignment for texts in 39 languages spanning seven language families and seven scripts. Unlike many existing multilingual corpora, a substantial number of documents were originally written in languages other than English, reducing English-centric bias and supporting more typologically diverse investigations. The texts are carefully selected to balance linguistic richness with experimental feasibility, particularly for eye-tracking-while-reading studies. Developed within a multi-lab initiative, the MultiplEYE Text Corpus follows unified translation, alignment, and experimental design guidelines to ensure cross-linguistic comparability. Its inclusion of texts varying in type and difficulty enables research on discourselevel processing, genre effects, and individual differences across a wide range of languages. The text corpus and accompanying metadata provide a robust foundation for multilingual psycholinguistic and computational modeling research. Data and materials are publicly available at https://doi.org/10.23668/psycharchives.21750.
2025
BLiMP-NL: A Corpus of Dutch Minimal Pairs and Acceptability Judgments for Language Model Evaluation
Michelle Suijkerbuijk | Zoë Prins | Marianne de Heer Kloots | Willem Zuidema | Stefan L. Frank
Computational Linguistics, Volume 51, Issue 4 - December 2025
Michelle Suijkerbuijk | Zoë Prins | Marianne de Heer Kloots | Willem Zuidema | Stefan L. Frank
Computational Linguistics, Volume 51, Issue 4 - December 2025
We present a corpus of 8,400 Dutch sentence pairs, intended primarily for the grammatical evaluation of language models. Each pair consists of a grammatical sentence and a minimally different ungrammatical sentence. The corpus covers 84 paradigms, classified into 22 syntactic phenomena. Ten sentence pairs of each paradigm were created by hand, while the remaining 90 were generated semi-automatically and manually validated afterwards. Nine of the 10 hand-crafted sentences of each paradigm are rated for acceptability by at least 30 participants each, and for the same 9 sentences reading times are recorded per word, through self-paced reading. Here, we report on the construction of the dataset, the measured acceptability ratings and reading times, as well as the extent to which a variety of language models can be used to predict both the ground-truth grammaticality and human acceptability ratings.
2024
Neural language model gradients predict event-related brain potentials
Stefan L. Frank
Proceedings of the Society for Computation in Linguistics 2024
Stefan L. Frank
Proceedings of the Society for Computation in Linguistics 2024
2023
The Learnability of the Wh-Island Constraint in Dutch by a Long Short-Term Memory Network
Michelle Suijkerbuijk | Peter de Swart | Stefan L. Frank
Proceedings of the Society for Computation in Linguistics 2023
Michelle Suijkerbuijk | Peter de Swart | Stefan L. Frank
Proceedings of the Society for Computation in Linguistics 2023
2022
Seeing the advantage: visually grounding word embeddings to better capture human semantic knowledge
Danny Merkx | Stefan Frank | Mirjam Ernestus
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Danny Merkx | Stefan Frank | Mirjam Ernestus
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Distributional semantic models capture word-level meaning that is useful in many natural language processing tasks and have even been shown to capture cognitive aspects of word meaning. The majority of these models are purely text based, even though the human sensory experience is much richer. In this paper we create visually grounded word embeddings by combining English text and images and compare them to popular text-based methods, to see if visual information allows our model to better capture cognitive aspects of word meaning. Our analysis shows that visually grounded embedding similarities are more predictive of the human reaction times in a large priming experiment than the purely text-based embeddings. The visually grounded embeddings also correlate well with human word similarity ratings. Importantly, in both experiments we show that the grounded embeddings account for a unique portion of explained variance, even when we include text-based embeddings trained on huge corpora. This shows that visual grounding allows our model to capture information that cannot be extracted using text as the only source of information.
2021
Human Sentence Processing: Recurrence or Attention?
Danny Merkx | Stefan L. Frank
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Danny Merkx | Stefan L. Frank
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Recurrent neural networks (RNNs) have long been an architecture of interest for computational models of human sentence processing. The recently introduced Transformer architecture outperforms RNNs on many natural language processing tasks but little is known about its ability to model human language processing. We compare Transformer- and RNN-based language models’ ability to account for measures of human reading effort. Our analysis shows Transformers to outperform RNNs in explaining self-paced reading times and neural activity during reading English sentences, challenging the widely held idea that human sentence processing involves recurrent and immediate processing and provides evidence for cue-based retrieval.
2020
Less is Better: A cognitively inspired unsupervised model for language segmentation
Jinbiao Yang | Stefan L. Frank | Antal van den Bosch
Proceedings of the Workshop on the Cognitive Aspects of the Lexicon
Jinbiao Yang | Stefan L. Frank | Antal van den Bosch
Proceedings of the Workshop on the Cognitive Aspects of the Lexicon
Language users process utterances by segmenting them into many cognitive units, which vary in their sizes and linguistic levels. Although we can do such unitization/segmentation easily, its cognitive mechanism is still not clear. This paper proposes an unsupervised model, Less-is-Better (LiB), to simulate the human cognitive process with respect to language unitization/segmentation. LiB follows the principle of least effort and aims to build a lexicon which minimizes the number of unit tokens (alleviating the effort of analysis) and number of unit types (alleviating the effort of storage) at the same time on any given corpus. LiB’s workflow is inspired by empirical cognitive phenomena. The design makes the mechanism of LiB cognitively plausible and the computational requirement light-weight. The lexicon generated by LiB performs the best among different types of lexicons (e.g. ground-truth words) both from an information-theoretical view and a cognitive view, which suggests that the LiB lexicon may be a plausible proxy of the mental lexicon.
2019
Dependency Parsing with your Eyes: Dependency Structure Predicts Eye Regressions During Reading
Alessandro Lopopolo | Stefan L. Frank | Antal van den Bosch | Roel Willems
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Alessandro Lopopolo | Stefan L. Frank | Antal van den Bosch | Roel Willems
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Backward saccades during reading have been hypothesized to be involved in structural reanalysis, or to be related to the level of text difficulty. We test the hypothesis that backward saccades are involved in online syntactic analysis. If this is the case we expect that saccades will coincide, at least partially, with the edges of the relations computed by a dependency parser. In order to test this, we analyzed a large eye-tracking dataset collected while 102 participants read three short narrative texts. Our results show a relation between backward saccades and the syntactic structure of sentences.
Simulating Spanish-English Code-Switching: El Modelo Está Generating Code-Switches
Chara Tsoukala | Stefan L. Frank | Antal van den Bosch | Jorge Valdés Kroff | Mirjam Broersma
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Chara Tsoukala | Stefan L. Frank | Antal van den Bosch | Jorge Valdés Kroff | Mirjam Broersma
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Multilingual speakers are able to switch from one language to the other (“code-switch”) between or within sentences. Because the underlying cognitive mechanisms are not well understood, in this study we use computational cognitive modeling to shed light on the process of code-switching. We employed the Bilingual Dual-path model, a Recurrent Neural Network of bilingual sentence production (Tsoukala et al., 2017), and simulated sentence production in simultaneous Spanish-English bilinguals. Our first goal was to investigate whether the model would code-switch without being exposed to code-switched training input. The model indeed produced code-switches even without any exposure to such input and the patterns of code-switches are in line with earlier linguistic work (Poplack,1980). The second goal of this study was to investigate an auxiliary phrase asymmetry that exists in Spanish-English code-switched production. Using this cognitive model, we examined a possible cause for this asymmetry. To our knowledge, this is the first computational cognitive model that aims to simulate code-switched sentence production.
2017
Data-Driven Broad-Coverage Grammars for Opinionated Natural Language Generation (ONLG)
Tomer Cagan | Stefan L. Frank | Reut Tsarfaty
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Tomer Cagan | Stefan L. Frank | Reut Tsarfaty
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Opinionated Natural Language Generation (ONLG) is a new, challenging, task that aims to automatically generate human-like, subjective, responses to opinionated articles online. We present a data-driven architecture for ONLG that generates subjective responses triggered by users’ agendas, consisting of topics and sentiments, and based on wide-coverage automatically-acquired generative grammars. We compare three types of grammatical representations that we design for ONLG, which interleave different layers of linguistic information and are induced from a new, enriched dataset we developed. Our evaluation shows that generation with Relational-Realizational (Tsarfaty and Sima’an, 2008) inspired grammar gets better language model scores than lexicalized grammars ‘a la Collins (2003), and that the latter gets better human-evaluation scores. We also show that conditioning the generation on topic models makes generated responses more relevant to the document content.
2014
Generating Subjective Responses to Opinionated Articles in Social Media: An Agenda-Driven Architecture and a Turing-Like Test
Tomer Cagan | Stefan L. Frank | Reut Tsarfaty
Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media
Tomer Cagan | Stefan L. Frank | Reut Tsarfaty
Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media
2013
Word surprisal predicts N400 amplitude during reading
Stefan L. Frank | Leun J. Otten | Giulia Galli | Gabriella Vigliocco
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Stefan L. Frank | Leun J. Otten | Giulia Galli | Gabriella Vigliocco
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
2012
Lexical surprisal as a general predictor of reading time
Irene Fernandez Monsalve | Stefan L. Frank | Gabriella Vigliocco
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Irene Fernandez Monsalve | Stefan L. Frank | Gabriella Vigliocco
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
2010
Uncertainty Reduction as a Measure of Cognitive Processing Effort
Stefan Frank
Proceedings of the 2010 Workshop on Cognitive Modeling and Computational Linguistics
Stefan Frank
Proceedings of the 2010 Workshop on Cognitive Modeling and Computational Linguistics
2007
Search
Fix author
Co-authors
- Antal van den Bosch 3
- Tomer Cagan 2
- Danny Merkx 2
- Michelle Suijkerbuijk 2
- Reut Tsarfaty 2
- Gabriella Vigliocco 2
- Jamal Abdul Nasir 1
- Dima Abu Romi 1
- Cengiz Acarturk 1
- Matilda Agdler 1
- Anton Marius Alexandru 1
- Mohd Faizan Ansari 1
- Annalisa Arcidiacono 1
- Hanne B. Søndergaard Knudsen 1
- Elizabete Ausma Velta Barisa 1
- Not Battesta Soliva 1
- Ana Bautista 1
- Lisa Beinborn 1
- Yevgeni Berzak 1
- Nedeljka Bjelanović 1
- Anna Bondar 1
- Anna Isabelle Bothmann 1
- Jan Brasser 1
- Mirjam Broersma 1
- Caterina Cacioli 1
- Ilze Ceple 1
- Adelina Cerpja 1
- Dalí Chirino 1
- Jan Chromý 1
- Alessandro Corona Mendozza 1
- Nazik Dinctopal Deniz 1
- Cui Ding 1
- Ana Došen 1
- Kristian Elersič 1
- Mirjam Ernestus 1
- Inmaculada Fajardo 1
- Irene Fernandez Monsalve 1
- Zigmunds Freibergs 1
- Giulia Galli 1
- Angelina Ganebnaya 1
- Shan Gao 1
- Jéssica Gomes 1
- Annjo Klungervik Greenall 1
- Alba Haveriku 1
- Miao He 1
- Anamaria Hodivoianu 1
- Nora Hollenstein 1
- Yu-Yin Hsu 1
- Amanda Isaksen 1
- Deborah N. Jakobi 1
- Andreia Janeiro 1
- Kristine Jensen de López 1
- Aleksandar Jevremovic 1
- Vojislav Jovanovic 1
- Lena Ann Jäger 1
- Ramunė Kasperė 1
- Nik Kharlamov 1
- Dorota Klimek-Jankowska 1
- Marianne De Heer Kloots 1
- Nelda Kote 1
- Vanja Kovic 1
- Sara Košutar 1
- Izabela Krejtz 1
- Thyra Krosness 1
- Oleksandra Kuvshynova 1
- Hanna Kędzierska 1
- Eilam Lavy 1
- Ella Lion 1
- Alessandro Lopopolo 1
- Paula Luegi 1
- Kaidi Lõo 1
- Mircea Mihai Marin 1
- Clara Martin 1
- Ana Matić 1
- Svitlana Matvieieva 1
- Valeriia Modina 1
- Jurgita Motiejūnienė 1
- Diane C. Mézière 1
- Xavier Mínguez-López 1
- Marie-Luise Müller 1
- Tolgonai Nasipbek kyzy 1
- Johanne S. K. Nedergård 1
- Sergiu Nisioi 1
- Leun J. Otten 1
- Patrizia Paggio 1
- Marijan Palmović 1
- Maria Christina Panagiotopoulou 1
- Alberto Parola 1
- Eva Pavlinušić Vilus 1
- Klaudia Petersen 1
- Anja Podlesek 1
- Eva Pospíšilová 1
- Marta Praulina 1
- Mikuláš Preininger 1
- Zoë Prins 1
- Loredana Pungă 1
- Helena Pérez 1
- Diego Rossini 1
- Špela Rot 1
- Habib Sani Yahaya 1
- Irina A. Sekerina 1
- Anne Gabija Skadina 1
- Jordi Solé-Casals 1
- Maja Stegenwallner-Schütz 1
- Chiara Tschirner 1
- Chara Tsoukala 1
- Jorge Valdés Kroff 1
- Saara M. Varjopuro 1
- Spyridoula Varlokosta 1
- João Veríssimo 1
- Oskari Juhapekka Virtanen 1
- Nemanja Vračar 1
- Mila Vulchanova 1
- Ahmad Mustapha Wali 1
- Roel Willems 1
- Peizheng Wu 1
- Jinbiao Yang 1
- Nilgün Yücel 1
- Willem Zuidema 1
- Peter de Swart 1
- Iria de-Dios-Flores 1
- Lonneke van der Plas 1
- Anila Çepani 1
- Ayşegül Özkan 1
- Marta Łockiewicz 1