The Little Prince in 26 Languages: Towards a Multilingual Neuro-Cognitive Corpus
Sabrina Stehwien, Lena Henke, John Hale, Jonathan Brennan, Lars Meyer
Abstract
We present the Le Petit Prince Corpus (LPPC), a multi-lingual resource for research in (computational) psycho- and neurolinguistics. The corpus consists of the children’s story The Little Prince in 26 languages. The dataset is in the process of being built using state-of-the-art methods for speech and language processing and electroencephalography (EEG). The planned release of LPPC dataset will include raw text annotated with dependency graphs in the Universal Dependencies standard, a near-natural-sounding synthetic spoken subset as well as EEG recordings. We will use this corpus for conducting neurolinguistic studies that generalize across a wide range of languages, overcoming typological constraints to traditional approaches. The planned release of the LPPC combines linguistic and EEG data for many languages using fully automatic methods, and thus constitutes a readily extendable resource that supports cross-linguistic and cross-disciplinary research.- Anthology ID:
- 2020.lincr-1.6
- Volume:
- Proceedings of the Second Workshop on Linguistic and Neurocognitive Resources
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Editors:
- Emmanuele Chersoni, Barry Devereux, Chu-Ren Huang
- Venue:
- LiNCr
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 43–49
- Language:
- English
- URL:
- https://aclanthology.org/2020.lincr-1.6
- DOI:
- Cite (ACL):
- Sabrina Stehwien, Lena Henke, John Hale, Jonathan Brennan, and Lars Meyer. 2020. The Little Prince in 26 Languages: Towards a Multilingual Neuro-Cognitive Corpus. In Proceedings of the Second Workshop on Linguistic and Neurocognitive Resources, pages 43–49, Marseille, France. European Language Resources Association.
- Cite (Informal):
- The Little Prince in 26 Languages: Towards a Multilingual Neuro-Cognitive Corpus (Stehwien et al., LiNCr 2020)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/2020.lincr-1.6.pdf