STAD: Self-Training with Ambiguous Data for Low-Resource Relation Extraction

Junjie Yu, Xing Wang, Jiangjiang Zhao, Chunjie Yang, Wenliang Chen


Abstract
We present a simple yet effective self-training approach, named as STAD, for low-resource relation extraction. The approach first classifies the auto-annotated instances into two groups: confident instances and uncertain instances, according to the probabilities predicted by a teacher model. In contrast to most previous studies, which mainly only use the confident instances for self-training, we make use of the uncertain instances. To this end, we propose a method to identify ambiguous but useful instances from the uncertain instances and then divide the relations into candidate-label set and negative-label set for each ambiguous instance. Next, we propose a set-negative training method on the negative-label sets for the ambiguous instances and a positive training method for the confident instances. Finally, a joint-training method is proposed to build the final relation extraction system on all data. Experimental results on two widely used datasets SemEval2010 Task-8 and Re-TACRED with low-resource settings demonstrate that this new self-training approach indeed achieves significant and consistent improvements when comparing to several competitive self-training systems.
Anthology ID:
2022.coling-1.178
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
2044–2054
Language:
URL:
https://aclanthology.org/2022.coling-1.178
DOI:
Bibkey:
Cite (ACL):
Junjie Yu, Xing Wang, Jiangjiang Zhao, Chunjie Yang, and Wenliang Chen. 2022. STAD: Self-Training with Ambiguous Data for Low-Resource Relation Extraction. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2044–2054, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
STAD: Self-Training with Ambiguous Data for Low-Resource Relation Extraction (Yu et al., COLING 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2022.coling-1.178.pdf
Code
 jjyunlp/stad
Data
Re-TACREDSemEval-2010 Task 8