Abstract
This paper describes the design, collection, orthographic transcription, and phonetic annotation of SpiCE, a new corpus of conversational Cantonese-English bilingual speech recorded in Vancouver, Canada. The corpus includes high-quality recordings of 34 early bilinguals in both English and Cantonese—to date, 27 have been recorded for a total of 19 hours of participant speech. Participants completed a sentence reading task, storyboard narration, and conversational interview in each language. Transcription and annotation for the corpus are currently underway. Transcripts produced with Google Cloud Speech-to-Text are available for all participants, and will be included in the initial SpiCE corpus release. Hand-corrected orthographic transcripts and force-aligned phonetic transcripts will be released periodically, and upon completion for all recordings, comprise the second release of the corpus. As an open-access language resource, SpiCE will promote bilingualism research for a typologically distinct pair of languages, of which Cantonese remains understudied despite there being millions of speakers around the world. The SpiCE corpus is especially well-suited for phonetic research on conversational speech, and enables researchers to study cross-language within-speaker phenomena for a diverse group of early Cantonese-English bilinguals. These are areas with few existing high-quality resources.- Anthology ID:
- 2020.lrec-1.503
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 4089–4095
- Language:
- English
- URL:
- https://preview.aclanthology.org/add_missing_videos/2020.lrec-1.503/
- DOI:
- Cite (ACL):
- Khia A. Johnson, Molly Babel, Ivan Fong, and Nancy Yiu. 2020. SpiCE: A New Open-Access Corpus of Conversational Bilingual Speech in Cantonese and English. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4089–4095, Marseille, France. European Language Resources Association.
- Cite (Informal):
- SpiCE: A New Open-Access Corpus of Conversational Bilingual Speech in Cantonese and English (Johnson et al., LREC 2020)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2020.lrec-1.503.pdf