Shu-Chuan Tseng


2024

2022

2014

Phone-aligned spoken corpora are indispensable language resources for quantitative linguistic analyses and automatic speech systems. However, producing this type of data resources is not an easy task due to high costs of time and man power as well as difficulties of applying valid annotation criteria and achieving reliable inter-labeler’s consistency. Among different types of spoken corpora, conversational speech that is often filled with extreme reduction and varying pronunciation variants is particularly challenging. By adopting a combined verification procedure, we obtained reasonably good annotation results. Preliminary phone boundaries that were automatically generated by a phone aligner were provided to human labelers for verifying. Instead of making use of the visualization of acoustic cues, the labelers should solely rely on their perceptual judgments to locate a position that best separates two adjacent phones. Impressionistic judgments in cases of reduction and segment deletion were helpful and necessary, as they balanced subtle nuance caused by differences in perception.

2013

2006

A dedicated resource, consisting of annotated speech tools, and workflow design, was developed for the detailed investigation of discourse phenomena in Taiwan Mandarin. The discourse phenomena have functions which are associated with positions in utterances, and temporal properties, and include discourse markers (“NAGE”, “NA”, e.g. “hesitation”, “utterance initiation”), discourse particles (“A”, e.g. “utterance finality”, “utterance continuity”, “focus”, etc.), and fillers (“UHN”, “hesitation”). The distribution of particles in relation to their position in utterances and the temporal properties of particles are investigated. The results of the investigation diverge considerably from claims in existing grammars of Mandarin with respect to utterance position, and show in general greater length than for regular syllables. These properties suggest the possibility of developing an automatic discourse item tagger.

2005

2001

2000