Xue Bai


2024

pdf bib
MiLe Loss: a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models
Zhenpeng Su | Xing Wu | Xue Bai | Zijia Lin | Hui Chen | Guiguang Ding | Wei Zhou | Songlin Hu
Findings of the Association for Computational Linguistics: NAACL 2024

Generative language models are usually pre-trained on large text corpus via predicting the next token (i.e., sub-word/word/phrase) given the previous ones. Recent works have demonstrated the impressive performance of large generative language models on downstream tasks. However, existing generative language models generally neglect an inherent challenge in text corpus during training, i.e., the imbalance between frequent tokens and infrequent ones. It can lead a language model to be dominated by common and easy-to-learn tokens, thereby overlooking the infrequent and difficult-to-learn ones. To alleviate that, we propose a **MiLe Loss** function for **mi**tigating the bias of **le**arning difficulties with tokens. During training, it can dynamically assess the learning difficulty of a to-be-learned token, according to the information entropy of the corresponding predicted probability distribution over the vocabulary. Then it scales the training loss adaptively, trying to lead the model to focus more on the difficult-to-learn tokens. On the Pile dataset, we train generative language models at different scales of 468M, 1.2B, and 6.7B parameters. Experiments reveal that models incorporating the proposed MiLe Loss can gain consistent performance improvement on downstream benchmarks.

pdf bib
Improving Continual Few-shot Relation Extraction through Relational Knowledge Distillation and Prototype Augmentation
Zhiheng Zhang | Daojian Zeng | Xue Bai
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this paper, we focus on the challenging yet practical problem of Continual Few-shot Relation Extraction (CFRE), which involves extracting relations in the continuous and iterative arrival of new data with only a few labeled examples. The main challenges in CFRE are overfitting due to few-shot learning and catastrophic forgetting caused by continual learning. To address these problems, we propose a novel framework called RK2DA, which seamlessly integrates prototype-based data augmentation and relational knowledge distillation. Specifically, RK2DA generates pseudo data by introducing Gaussian noise to the prototype embeddings and utilizes a novel two-phase multi-teacher relational knowledge distillation method to transfer various knowledge from different embedding spaces. Experimental results on the FewRel and TACRED datasets demonstrate that our method outperforms the state-of-the-art baselines.

2009

pdf bib
Normalized Accessor Variety Combined with Conditional Random Fields in Chinese Word Segmentation
Saike He | Taozheng Zhang | Xue Bai | Xiaojie Wang | Yuan Dong
Proceedings of the Student Research Workshop

pdf bib
Multi-Task Learning in Conditional Random Fields for Chunking in Shallow Semantic Parsing
Saike He | Xiaojie Wang | Yuan Dong | Taozheng Zhang | Xue Bai
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1