Parsing Speech: a Neural Approach to Integrating Lexical and Acoustic-Prosodic Information

Trang Tran, Shubham Toshniwal, Mohit Bansal, Kevin Gimpel, Karen Livescu, Mari Ostendorf

[How to correct problems with metadata yourself]


Abstract
In conversational speech, the acoustic signal provides cues that help listeners disambiguate difficult parses. For automatically parsing spoken utterances, we introduce a model that integrates transcribed text and acoustic-prosodic features using a convolutional neural network over energy and pitch trajectories coupled with an attention-based recurrent neural network that accepts text and prosodic features. We find that different types of acoustic-prosodic features are individually helpful, and together give statistically significant improvements in parse and disfluency detection F1 scores over a strong text-only baseline. For this study with known sentence boundaries, error analyses show that the main benefit of acoustic-prosodic features is in sentences with disfluencies, attachment decisions are most improved, and transcription errors obscure gains from prosody.
Anthology ID:
N18-1007
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Editors:
Marilyn Walker, Heng Ji, Amanda Stent
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
69–81
Language:
URL:
https://aclanthology.org/N18-1007
DOI:
10.18653/v1/N18-1007
Bibkey:
Cite (ACL):
Trang Tran, Shubham Toshniwal, Mohit Bansal, Kevin Gimpel, Karen Livescu, and Mari Ostendorf. 2018. Parsing Speech: a Neural Approach to Integrating Lexical and Acoustic-Prosodic Information. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 69–81, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
Parsing Speech: a Neural Approach to Integrating Lexical and Acoustic-Prosodic Information (Tran et al., NAACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/teach-a-man-to-fish/N18-1007.pdf
Video:
 https://preview.aclanthology.org/teach-a-man-to-fish/N18-1007.mp4
Code
 shtoshni92/speech_parsing