Improving Disfluency Detection by Self-Training a Self-Attentive Model

Paria Jamshid Lou; Mark Johnson

doi:10.18653/v1/2020.acl-main.346

Improving Disfluency Detection by Self-Training a Self-Attentive Model

Abstract

Self-attentive neural syntactic parsers using contextualized word embeddings (e.g. ELMo or BERT) currently produce state-of-the-art results in joint parsing and disfluency detection in speech transcripts. Since the contextualized word embeddings are pre-trained on a large amount of unlabeled data, using additional unlabeled data to train a neural model might seem redundant. However, we show that self-training — a semi-supervised technique for incorporating unlabeled data — sets a new state-of-the-art for the self-attentive parser on disfluency detection, demonstrating that self-training provides benefits orthogonal to the pre-trained contextualized word representations. We also show that ensembling self-trained parsers provides further gains for disfluency detection.

Anthology ID:: 2020.acl-main.346
Volume:: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:: July
Year:: 2020
Address:: Online
Editors:: Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3754–3763
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.acl-main.346/
DOI:: 10.18653/v1/2020.acl-main.346
Bibkey:
Cite (ACL):: Paria Jamshid Lou and Mark Johnson. 2020. Improving Disfluency Detection by Self-Training a Self-Attentive Model. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3754–3763, Online. Association for Computational Linguistics.
Cite (Informal):: Improving Disfluency Detection by Self-Training a Self-Attentive Model (Jamshid Lou & Johnson, ACL 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.acl-main.346.pdf
Video:: http://slideslive.com/38929215

PDF Cite Search Video Fix data