Using active learning to expand training data for implicit discourse relation recognition
Yang Xu, Yu Hong, Huibin Ruan, Jianmin Yao, Min Zhang, Guodong Zhou
Abstract
We tackle discourse-level relation recognition, a problem of determining semantic relations between text spans. Implicit relation recognition is challenging due to the lack of explicit relational clues. The increasingly popular neural network techniques have been proven effective for semantic encoding, whereby widely employed to boost semantic relation discrimination. However, learning to predict semantic relations at a deep level heavily relies on a great deal of training data, but the scale of the publicly available data in this field is limited. In this paper, we follow Rutherford and Xue (2015) to expand the training data set using the corpus of explicitly-related arguments, by arbitrarily dropping the overtly presented discourse connectives. On the basis, we carry out an experiment of sampling, in which a simple active learning approach is used, so as to take the informative instances for data expansion. The goal is to verify whether the selective use of external data not only reduces the time consumption of retraining but also ensures a better system performance. Using the expanded training data, we retrain a convolutional neural network (CNN) based classifer which is a simplified version of Qin et al. (2016)’s stacking gated relation recognizer. Experimental results show that expanding the training set with small-scale carefully-selected external data yields substantial performance gain, with the improvements of about 4% for accuracy and 3.6% for F-score. This allows a weak classifier to achieve a comparable performance against the state-of-the-art systems.- Anthology ID:
- D18-1079
- Volume:
- Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
- Month:
- October-November
- Year:
- 2018
- Address:
- Brussels, Belgium
- Editors:
- Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 725–731
- Language:
- URL:
- https://aclanthology.org/D18-1079
- DOI:
- 10.18653/v1/D18-1079
- Cite (ACL):
- Yang Xu, Yu Hong, Huibin Ruan, Jianmin Yao, Min Zhang, and Guodong Zhou. 2018. Using active learning to expand training data for implicit discourse relation recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 725–731, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- Using active learning to expand training data for implicit discourse relation recognition (Xu et al., EMNLP 2018)
- PDF:
- https://preview.aclanthology.org/ml4al-ingestion/D18-1079.pdf
- Code
- AndreaXu0401/ALIDRC