ScanDL: A Diffusion Model for Generating Synthetic Scanpaths on Texts
Lena Bolliger, David Reich, Patrick Haller, Deborah Jakobi, Paul Prasse, Lena Jäger
Abstract
Eye movements in reading play a crucial role in psycholinguistic research studying the cognitive mechanisms underlying human language processing. More recently, the tight coupling between eye movements and cognition has also been leveraged for language-related machine learning tasks such as the interpretability, enhancement, and pre-training of language models, as well as the inference of reader- and text-specific properties. However, scarcity of eye movement data and its unavailability at application time poses a major challenge for this line of research. Initially, this problem was tackled by resorting to cognitive models for synthesizing eye movement data. However, for the sole purpose of generating human-like scanpaths, purely data-driven machine-learning-based methods have proven to be more suitable. Following recent advances in adapting diffusion processes to discrete data, we propose ScanDL, a novel discrete sequence-to-sequence diffusion model that generates synthetic scanpaths on texts. By leveraging pre-trained word representations and jointly embedding both the stimulus text and the fixation sequence, our model captures multi-modal interactions between the two inputs. We evaluate ScanDL within- and across-dataset and demonstrate that it significantly outperforms state-of-the-art scanpath generation methods. Finally, we provide an extensive psycholinguistic analysis that underlines the model’s ability to exhibit human-like reading behavior. Our implementation is made available at https://github.com/DiLi-Lab/ScanDL.- Anthology ID:
- 2023.emnlp-main.960
- Volume:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 15513–15538
- Language:
- URL:
- https://aclanthology.org/2023.emnlp-main.960
- DOI:
- 10.18653/v1/2023.emnlp-main.960
- Cite (ACL):
- Lena Bolliger, David Reich, Patrick Haller, Deborah Jakobi, Paul Prasse, and Lena Jäger. 2023. ScanDL: A Diffusion Model for Generating Synthetic Scanpaths on Texts. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15513–15538, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- ScanDL: A Diffusion Model for Generating Synthetic Scanpaths on Texts (Bolliger et al., EMNLP 2023)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/2023.emnlp-main.960.pdf