STAD: Self-Training with Ambiguous Data for Low-Resource Relation Extraction
Junjie Yu, Xing Wang, Jiangjiang Zhao, Chunjie Yang, Wenliang Chen
Abstract
We present a simple yet effective self-training approach, named as STAD, for low-resource relation extraction. The approach first classifies the auto-annotated instances into two groups: confident instances and uncertain instances, according to the probabilities predicted by a teacher model. In contrast to most previous studies, which mainly only use the confident instances for self-training, we make use of the uncertain instances. To this end, we propose a method to identify ambiguous but useful instances from the uncertain instances and then divide the relations into candidate-label set and negative-label set for each ambiguous instance. Next, we propose a set-negative training method on the negative-label sets for the ambiguous instances and a positive training method for the confident instances. Finally, a joint-training method is proposed to build the final relation extraction system on all data. Experimental results on two widely used datasets SemEval2010 Task-8 and Re-TACRED with low-resource settings demonstrate that this new self-training approach indeed achieves significant and consistent improvements when comparing to several competitive self-training systems.- Anthology ID:
- 2022.coling-1.178
- Volume:
- Proceedings of the 29th International Conference on Computational Linguistics
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Editors:
- Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 2044–2054
- Language:
- URL:
- https://aclanthology.org/2022.coling-1.178
- DOI:
- Cite (ACL):
- Junjie Yu, Xing Wang, Jiangjiang Zhao, Chunjie Yang, and Wenliang Chen. 2022. STAD: Self-Training with Ambiguous Data for Low-Resource Relation Extraction. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2044–2054, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Cite (Informal):
- STAD: Self-Training with Ambiguous Data for Low-Resource Relation Extraction (Yu et al., COLING 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2022.coling-1.178.pdf
- Code
- jjyunlp/stad
- Data
- Re-TACRED, SemEval-2010 Task-8