Abstract
Context influences how we engage with multimodal documents. Describing and processing the content of images is highly correlated with the goals of the discourse. It is known that these underlying cognitive processes can be tapped into by looking at eye movements, but the connection between discourse goals and eye movements is a missing link. In this study, we carry out both augmented reality and webcam-based eye-tracking experiments during comprehension and production tasks. We build on computational frameworks of coherence in text and images that study causal, logical, elaborative, and temporal inferences to understand how eye gaze patterns and coherence relations influence each other. No state-of-the-art techniques exist to analyze eye movements in multimodal language settings. So, we introduce a new eye gaze pattern ranking algorithm and a semantic gaze visualization technique to study this phenomenon better. Our results demonstrate that eye gaze durations are person-dependent, and during comprehension and production, ranked gaze patterns are significantly different for different types of coherence relations. We also present a case study of how Multimodal Large Language Models represent this connection of eye gaze patterns and coherence relations. We make all of our code and novel analysis tools available through https://github.com/Merterm/eye-gaze-coherence.- Anthology ID:
- 2024.lrec-main.1263
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 14494–14512
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.1263
- DOI:
- Cite (ACL):
- Mert Inan and Malihe Alikhani. 2024. Seeing Eye-to-Eye: Cross-Modal Coherence Relations Inform Eye-gaze Patterns During Comprehension & Production. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 14494–14512, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Seeing Eye-to-Eye: Cross-Modal Coherence Relations Inform Eye-gaze Patterns During Comprehension & Production (Inan & Alikhani, LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2024.lrec-main.1263.pdf