The Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish Texts

Nora Hollenstein, Maria Barrett, Marina Björnsdóttir


Abstract
Eye movement recordings from reading are one of the richest signals of human language processing. Corpora of eye movements during reading of contextualized running text is a way of making such records available for natural language processing purposes. Such corpora already exist in some languages. We present CopCo, the Copenhagen Corpus of eye tracking recordings from natural reading of Danish texts. It is the first eye tracking corpus of its kind for the Danish language. CopCo includes 1,832 sentences with 34,897 tokens of Danish text extracted from a collection of speech manuscripts. This first release of the corpus contains eye tracking data from 22 participants. It will be extended continuously with more participants and texts from other genres. We assess the data quality of the recorded eye movements and find that the extracted features are in line with related research. The dataset available here: https://osf.io/ud8s5/.
Anthology ID:
2022.lrec-1.182
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1712–1720
Language:
URL:
https://aclanthology.org/2022.lrec-1.182
DOI:
Bibkey:
Cite (ACL):
Nora Hollenstein, Maria Barrett, and Marina Björnsdóttir. 2022. The Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish Texts. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1712–1720, Marseille, France. European Language Resources Association.
Cite (Informal):
The Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish Texts (Hollenstein et al., LREC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/remove-xml-comments/2022.lrec-1.182.pdf