2024
pdf
abs
LoRA Meets Dropout under a Unified Framework
Sheng Wang
|
Liheng Chen
|
Jiyue Jiang
|
Boyang Xue
|
Lingpeng Kong
|
Chuan Wu
Findings of the Association for Computational Linguistics ACL 2024
With the remarkable capabilities, large language models (LLMs) have emergedas essential elements in numerous NLP applications, while parameter-efficientfinetuning, especially LoRA, has gained popularity as a lightweight approachfor model customization. Meanwhile, various dropout methods, initially designedfor full finetuning with all the parameters updated, alleviates overfittingassociated with excessive parameter redundancy. Hence, a possible contradictionarises from negligible trainable parameters of LoRA and the effectiveness ofprevious dropout methods, which has been largely overlooked. To fill this gap,we first confirm that parameter-efficient LoRA is also overfitting-prone. Wethen revisit transformer-specific dropout methods, and establish theirequivalence and distinctions mathematically and empirically. Building upon thiscomparative analysis, we introduce a unified framework for a comprehensiveinvestigation, which instantiates these methods based on dropping position,structural pattern and compensation measure. Through this framework, we revealthe new preferences and performance comparisons of them when involved withlimited trainable parameters. This framework also allows us to amalgamate themost favorable aspects into a novel dropout method named HiddenKey. Extensiveexperiments verify the remarkable superiority and sufficiency of HiddenKeyacross multiple models and tasks, which highlights it as the preferred approachfor high-performance and parameter-efficient finetuning of LLMs.
pdf
abs
PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA
Sheng Wang
|
Boyang Xue
|
Jiacheng Ye
|
Jiyue Jiang
|
Liheng Chen
|
Lingpeng Kong
|
Chuan Wu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
With the rapid scaling of large language models (LLMs), serving numerouslow-rank adaptations (LoRAs) concurrently has become increasingly impractical,leading to unaffordable costs and necessitating more parameter-efficientfinetuning methods. In this work, we introduce Partially Rotation-enhanced Low-Rank Adaptation (PRoLoRA), an intra-layer sharing mechanism comprising fouressential components: broadcast reduction, rotation enhancement,partially-sharing refinement, and rectified initialization strategy. As asuperset of LoRA, PRoLoRA retains its advantages, and effectively circumventthe drawbacks of peer parameter-sharing methods with superior model capacity,practical feasibility, and broad applicability. Empirical experimentsdemonstrate the remarkably higher parameter efficiency of PRoLoRA in bothspecific parameter budget and performance target scenarios, and its scalabilityto larger LLMs. Notably, with one time less trainable parameters, PRoLoRA stilloutperforms LoRA on multiple instruction tuning datasets. Subsequently, anablation study is conducted to validate the necessity of individual componentsand highlight the superiority of PRoLoRA over three potential variants.Hopefully, the conspicuously higher parameter efficiency can establish PRoLoRAas a resource-friendly alternative to LoRA.
2023
pdf
abs
A Cognitive Stimulation Dialogue System with Multi-source Knowledge Fusion for Elders with Cognitive Impairment
Jiyue Jiang
|
Sheng Wang
|
Qintong Li
|
Lingpeng Kong
|
Chuan Wu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
When communicating with elders with cognitive impairment, cognitive stimulation (CS) help to maintain the cognitive health of elders. Data sparsity is the main challenge in building CS-based dialogue systems, particularly in the Chinese language. To fill this gap, we construct a Chinese CS conversation (CSConv) dataset, which contains about 2.6K groups of dialogues with therapy principles and emotional support strategy labels. Making chit chat while providing emotional support is overlooked by the majority of existing cognitive dialogue systems. In this paper, we propose a multi-source knowledge fusion method for CS dialogue (CSD), to generate open-ended responses guided by the therapy principle and emotional support strategy. We first use a progressive mask method based on external knowledge to learn encoders as effective classifiers, which is the prerequisite to predict the therapy principle and emotional support strategy of the target response. Then a decoder interacts with the perceived therapy principle and emotional support strategy to generate responses. Extensive experiments conducted on the CSConv dataset demonstrate the effectiveness of the proposed method, while there is still a large space for improvement compared to human performance.
2020
pdf
abs
WN-Salience: A Corpus of News Articles with Entity Salience Annotations
Chuan Wu
|
Evangelos Kanoulas
|
Maarten de Rijke
|
Wei Lu
Proceedings of the Twelfth Language Resources and Evaluation Conference
Entities can be found in various text genres, ranging from tweets and web pages to user queries submitted to web search engines. Existing research either considers all entities in the text equally important, or heuristics are used to measure their salience. We believe that a key reason for the relatively limited work on entity salience is the lack of appropriate datasets. To support research on entity salience, we present a new dataset, the WikiNews Salience dataset (WN-Salience), which can be used to benchmark tasks such as entity salience detection and salient entity linking. WN-Salience is built on top of Wikinews, a Wikimedia project whose mission is to present reliable news articles. Entities in Wikinews articles are identified by the authors of the articles and are linked to Wikinews categories when they are salient or to Wikipedia pages otherwise. The dataset is built automatically, and consists of approximately 7,000 news articles, and 90,000 in-text entity annotations. We compare the WN-Salience dataset against existing datasets on the task and analyze their differences. Furthermore, we conduct experiments on entity salience detection; the results demonstrate that WN-Salience is a challenging testbed that is complementary to existing ones.