Sendong Zhao


2022

pdf
Prompt Combines Paraphrase: Teaching Pre-trained Models to Understand Rare Biomedical Words
Haochun Wang | Chi Liu | Nuwa Xi | Sendong Zhao | Meizhi Ju | Shiwei Zhang | Ziheng Zhang | Yefeng Zheng | Bing Qin | Ting Liu
Proceedings of the 29th International Conference on Computational Linguistics

Prompt-based fine-tuning for pre-trained models has proven effective for many natural language processing tasks under few-shot settings in general domain. However, tuning with prompt in biomedical domain has not been investigated thoroughly. Biomedical words are often rare in general domain, but quite ubiquitous in biomedical contexts, which dramatically deteriorates the performance of pre-trained models on downstream biomedical applications even after fine-tuning, especially in low-resource scenarios. We propose a simple yet effective approach to helping models learn rare biomedical words during tuning with prompt. Experimental results show that our method can achieve up to 6% improvement in biomedical natural language inference task without any extra parameters or training steps using few-shot vanilla prompt settings.

2021

pdf
Less Is More: Domain Adaptation with Lottery Ticket for Reading Comprehension
Haichao Zhu | Zekun Wang | Heng Zhang | Ming Liu | Sendong Zhao | Bing Qin
Findings of the Association for Computational Linguistics: EMNLP 2021

In this paper, we propose a simple few-shot domain adaptation paradigm for reading comprehension. We first identify the lottery subnetwork structure within the Transformer-based source domain model via gradual magnitude pruning. Then, we only fine-tune the lottery subnetwork, a small fraction of the whole parameters, on the annotated target domain data for adaptation. To obtain more adaptable subnetworks, we introduce self-attention attribution to weigh parameters, beyond simply pruning the smallest magnitude parameters, which can be seen as combining structured pruning and unstructured magnitude pruning softly. Experimental results show that our method outperforms the full model fine-tuning adaptation on four out of five domains when only a small amount of annotated data available for adaptation. Moreover, introducing self-attention attribution reserves more parameters for important attention heads in the lottery subnetwork and improves the target domain model performance. Our further analyses reveal that, besides exploiting fewer parameters, the choice of subnetworks is critical to the effectiveness.