Refining rtMRI Landmark-Based Vocal Tract Contour Labels with FCN-Based Smoothing and Point-to-Curve Projection

Mushaffa Rasyid Ridha, Sakriani Sakti


Abstract
Advanced real-time Magnetic Resonance Imaging (rtMRI) enables researchers to study dynamic articulatory movements during speech production with high temporal resolution. However, accurately outlining articulator contours in high-frame-rate rtMRI presents challenges due to data scalability and image quality issues, making manual and automatic labeling difficult. The widely used publicly available USC-TIMIT dataset offers rtMRI data with landmark-based contour labels derived from unsupervised region segmentation using spatial frequency domain representation and gradient descent optimization. Unfortunately, occasional labeling errors exist, and many contour detection methods were trained and tested based on this ground truth, which is not purely a gold label, with the resulting contour data largely remaining undisclosed to the public. This paper offers a refinement of landmark-based vocal-tract contour labels by employing outlier removal, full convolutional network (FCN)-based smoothing, and a landmark point-to-edge curve projection technique. Since there is no established ground truth label, we evaluate the quality of the new labels through subjective assessments of several contour areas, comparing them to the existing data labels.
Anthology ID:
2024.lrec-main.1204
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
13796–13802
Language:
URL:
https://aclanthology.org/2024.lrec-main.1204
DOI:
Bibkey:
Cite (ACL):
Mushaffa Rasyid Ridha and Sakriani Sakti. 2024. Refining rtMRI Landmark-Based Vocal Tract Contour Labels with FCN-Based Smoothing and Point-to-Curve Projection. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 13796–13802, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Refining rtMRI Landmark-Based Vocal Tract Contour Labels with FCN-Based Smoothing and Point-to-Curve Projection (Ridha & Sakti, LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2024.lrec-main.1204.pdf