Abstract
We present an analysis of parser performance on speech data, comparing word type and token frequency distributions with written data, and evaluating parse accuracy by length of input string. We find that parser performance tends to deteriorate with increasing length of string, more so for spoken than for written texts. We train an alternative parsing model with added speech data and demonstrate improvements in accuracy on speech-units, with no deterioration in performance on written text.- Anthology ID:
- W17-4604
- Volume:
- Proceedings of the Workshop on Speech-Centric Natural Language Processing
- Month:
- September
- Year:
- 2017
- Address:
- Copenhagen, Denmark
- Venue:
- WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 27–36
- Language:
- URL:
- https://aclanthology.org/W17-4604
- DOI:
- 10.18653/v1/W17-4604
- Cite (ACL):
- Andrew Caines, Michael McCarthy, and Paula Buttery. 2017. Parsing transcripts of speech. In Proceedings of the Workshop on Speech-Centric Natural Language Processing, pages 27–36, Copenhagen, Denmark. Association for Computational Linguistics.
- Cite (Informal):
- Parsing transcripts of speech (Caines et al., 2017)
- PDF:
- https://preview.aclanthology.org/starsem-semeval-split/W17-4604.pdf
- Data
- English Web Treebank