Mingyu Li


2022

pdf
1Cademy @ Causal News Corpus 2022: Leveraging Self-Training in Causality Classification of Socio-Political Event Data
Adam Nik | Ge Zhang | Xingran Chen | Mingyu Li | Jie Fu
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

This paper details our participation in the Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE) workshop @ EMNLP 2022, where we take part in Subtask 1 of Shared Task 3 {citep{tan-etal-2022-event}. We approach the given task of event causality detection by proposing a self-training pipeline that follows a teacher-student classifier method. More specifically, we initially train a teacher model on the true, original task data, and use that teacher model to self-label data to be used in the training of a separate student model for the final task prediction. We test how restricting the number of positive or negative self-labeled examples in the self-training process affects classification performance. Our final results show that using self-training produces a comprehensive performance improvement across all models and self-labeled training sets tested within the task of event causality sequence classification. On top of that, we find that self-training performance did not diminish even when restricting either positive/negative examples used in training.Our code is be publicly available at {hyperlink{https://github.com/Gzhang-umich/1CademyTeamOfCASE}{https://github.com/Gzhang-umich/1CademyTeamOfCASE}.

pdf
1Cademy @ Causal News Corpus 2022: Enhance Causal Span Detection via Beam-Search-based Position Selector
Xingran Chen | Ge Zhang | Adam Nik | Mingyu Li | Jie Fu
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

In this paper, we present our approach and empirical observations for Cause-Effect Signal Span Detection—Subtask 2 of Shared task 3 at CASE 2022. The shared task aims to extract the cause, effect, and signal spans from a given causal sentence.We model the task as a reading comprehension (RC) problem and apply a token-level RC-based span prediction paradigm to the task as the baseline.We explore different training objectives to fine-tune the model, as well as data augmentation (DA) tricks based on the language model (LM) for performance improvement.Additionally, we propose an efficient beam-search post-processing strategy to due with the drawbacks of span detection to obtain a further performance gain.Our approach achieves an average $F_1$ score of 54.15 and ranks {textbf{$1ˆ{st}$} in the CASE competition. Our code is available at {url{https://github.com/Gzhang-umich/1CademyTeamOfCASE}.