The ManDi Corpus: A Spoken Corpus of Mandarin Regional Dialects

Liang Zhao; Eleanor Chodroff

The ManDi Corpus: A Spoken Corpus of Mandarin Regional Dialects

Abstract

In the present paper, we introduce the ManDi Corpus, a spoken corpus of regional Mandarin dialects and Standard Mandarin. The corpus currently contains 357 recordings (about 9.6 hours) of monosyllabic words, disyllabic words, short sentences, a short passage and a poem, each produced in Standard Mandarin and in one of six regional Mandarin dialects: Beijing, Chengdu, Jinan, Taiyuan, Wuhan, and Xi’an Mandarin from 36 speakers. The corpus was collected remotely using participant-controlled smartphone recording apps. Word- and phone-level alignments were generated using Praat and the Montreal Forced Aligner. The pilot study of dialect-specific tone systems showed that with practicable design and decent recording quality, remotely collected speech data can be suitable for analysis of relative patterns in acoustic-phonetic realization. The corpus is available on OSF (https://osf.io/fgv4w/) for non-commercial use under a CC BY-NC 3.0 license.

Anthology ID:: 2022.lrec-1.213
Volume:: Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:: June
Year:: 2022
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 1985–1990
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.lrec-1.213/
DOI:
Bibkey:
Cite (ACL):: Liang Zhao and Eleanor Chodroff. 2022. The ManDi Corpus: A Spoken Corpus of Mandarin Regional Dialects. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1985–1990, Marseille, France. European Language Resources Association.
Cite (Informal):: The ManDi Corpus: A Spoken Corpus of Mandarin Regional Dialects (Zhao & Chodroff, LREC 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.lrec-1.213.pdf

PDF Cite Search Fix data