STAD: Self-Training with Ambiguous Data for Low-Resource Relation Extraction

Junjie Yu; Xing Wang; Jiangjiang Zhao; Chunjie Yang; Wenliang Chen

STAD: Self-Training with Ambiguous Data for Low-Resource Relation Extraction

Junjie Yu, Xing Wang, Jiangjiang Zhao, Chunjie Yang, Wenliang Chen

Abstract

We present a simple yet effective self-training approach, named as STAD, for low-resource relation extraction. The approach first classifies the auto-annotated instances into two groups: confident instances and uncertain instances, according to the probabilities predicted by a teacher model. In contrast to most previous studies, which mainly only use the confident instances for self-training, we make use of the uncertain instances. To this end, we propose a method to identify ambiguous but useful instances from the uncertain instances and then divide the relations into candidate-label set and negative-label set for each ambiguous instance. Next, we propose a set-negative training method on the negative-label sets for the ambiguous instances and a positive training method for the confident instances. Finally, a joint-training method is proposed to build the final relation extraction system on all data. Experimental results on two widely used datasets SemEval2010 Task-8 and Re-TACRED with low-resource settings demonstrate that this new self-training approach indeed achieves significant and consistent improvements when comparing to several competitive self-training systems.

Anthology ID:: 2022.coling-1.178
Volume:: Proceedings of the 29th International Conference on Computational Linguistics
Month:: October
Year:: 2022
Address:: Gyeongju, Republic of Korea
Editors:: Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:: COLING
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 2044–2054
Language:
URL:: https://aclanthology.org/2022.coling-1.178
DOI:
Bibkey:
Cite (ACL):: Junjie Yu, Xing Wang, Jiangjiang Zhao, Chunjie Yang, and Wenliang Chen. 2022. STAD: Self-Training with Ambiguous Data for Low-Resource Relation Extraction. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2044–2054, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):: STAD: Self-Training with Ambiguous Data for Low-Resource Relation Extraction (Yu et al., COLING 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-1/2022.coling-1.178.pdf
Code: jjyunlp/stad
Data: Re-TACRED, SemEval-2010 Task-8

PDF Search Code