Abstract
While the disfluency detection has achieved notable success in the past years, it still severely suffers from the data scarcity. To tackle this problem, we propose a novel semi-supervised approach which can utilize large amounts of unlabelled data. In this work, a light-weight neural net is proposed to extract the hidden features based solely on self-attention without any Recurrent Neural Network (RNN) or Convolutional Neural Network (CNN). In addition, we use the unlabelled corpus to enhance the performance. Besides, the Generative Adversarial Network (GAN) training is applied to enforce the similar distribution between the labelled and unlabelled data. The experimental results show that our approach achieves significant improvements over strong baselines.- Anthology ID:
- C18-1299
- Volume:
- Proceedings of the 27th International Conference on Computational Linguistics
- Month:
- August
- Year:
- 2018
- Address:
- Santa Fe, New Mexico, USA
- Editors:
- Emily M. Bender, Leon Derczynski, Pierre Isabelle
- Venue:
- COLING
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3529–3538
- Language:
- URL:
- https://aclanthology.org/C18-1299
- DOI:
- Cite (ACL):
- Feng Wang, Wei Chen, Zhen Yang, Qianqian Dong, Shuang Xu, and Bo Xu. 2018. Semi-Supervised Disfluency Detection. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3529–3538, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- Cite (Informal):
- Semi-Supervised Disfluency Detection (Wang et al., COLING 2018)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/C18-1299.pdf