Yau-Shian Wang
2026
Attn-GS: Attention-Guided Context Compression for Efficient Personalized LLMs
Shenglai Zeng | Tianqi Zheng | Chuan Tian | Dante Everaert | Yau-Shian Wang | Yupin Huang | Michael J. Morais | Rohit Patki | Jinjin Tian | Xinnan Dai | Kai Guo | Monica Xiao Cheng | Hui Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shenglai Zeng | Tianqi Zheng | Chuan Tian | Dante Everaert | Yau-Shian Wang | Yupin Huang | Michael J. Morais | Rohit Patki | Jinjin Tian | Xinnan Dai | Kai Guo | Monica Xiao Cheng | Hui Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Personalizing large language models (LLMs) to individual users requires incorporating extensive interaction histories and profiles, but input token constraints make this impractical due to high inference latency and API costs. Existing approaches rely on heuristic methods such as selecting recent interactions or prompting summarization models to compress user profiles. However, these methods treat context as a monolithic whole and fail to consider how LLMs internally process and prioritize different profile components. We investigate whether LLMs’ attention patterns can effectively identify important personalization signals for intelligent context compression. Through preliminary studies on representative personalization tasks, we discover that (a) LLMs’ attention patterns naturally reveal important signals, and (b) fine-tuning enhances LLMs’ ability to distinguish between relevant and irrelevant information. Based on these insights, we propose Attn-GS, an attention-guided context compression framework that leverages attention feedback from a marking model to mark important personalization sentences, then guides a compression model to generate task-relevant, high-quality compressed user contexts. Extensive experiments demonstrate that Attn-GS significantly outperforms various baselines across different tasks, token limits, and settings, achieving performance close to using full context while reducing token usage by 50 times.
2024
Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-following LLM
Ruohong Zhang | Yau-Shian Wang | Yiming Yang
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Ruohong Zhang | Yau-Shian Wang | Yiming Yang
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
The remarkable performance of large language models (LLMs) in zero-shot language understanding has garnered significant attention.However, employing LLMs for large-scale inference or domain-specific fine-tuning requires immense computational resources due to their substantial model size. To overcome these limitations, we introduce a novel method, namely GenCo, which leverages the strong generative power of LLMs to assist in training a smaller and more adaptable language model. In our method, an LLM plays an important role in the self-training loop of a smaller model in two important ways. Firstly, we utilize an LLM to generate multiple augmented texts for each input instance to enhance its semantic meaning for better understanding. Secondly, we additionally generate high-quality training instances conditioned on predicted labels, ensuring the generated texts are relevant to the labels. In this way, GenCo not only corrects the errors of predicted labels during self-training but also eliminates the need for extensive unlabeled texts. In our experiments, GenCo outperforms previous state-of-the-art methods when only limited (<5% of original) in-domain text data is available. Notably, our approach surpasses Alpaca-7B with human instructions, highlighting the significance of self-training.
2023
Long-tailed Extreme Multi-label Text Classification by the Retrieval of Generated Pseudo Label Descriptions
Ruohong Zhang | Yau-Shian Wang | Yiming Yang | Donghan Yu | Tom Vu | Likun Lei
Findings of the Association for Computational Linguistics: EACL 2023
Ruohong Zhang | Yau-Shian Wang | Yiming Yang | Donghan Yu | Tom Vu | Likun Lei
Findings of the Association for Computational Linguistics: EACL 2023
Extreme Multi-label Text Classification (XMTC) has been a tough challenge in machine learning research and applications due to the sheer sizes of the label spaces and the severe data scarcity problem associated with the long tail of rare labels in highly skewed distributions. This paper addresses the challenge of tail label prediction by leveraging the power of dense neural retrieval model in mapping input documents (as queries) to relevant label descriptions. To further enhance the quality of label descriptions, we propose to generate pseudo label descriptions from a trained bag-of-words (BoW) classifier, which demonstrates better classification performance under severe scarce data conditions. The proposed approach achieves the state-of-the-art (SOTA) performance of overall label prediction on XMTC benchmark datasets and especially outperforms the SOTA models in the tail label prediction. We also provide a theoretical analysis for relating the BoW and neural models w.r.t. performance lower bound.
PESCO: Prompt-enhanced Self Contrastive Learning for Zero-shot Text Classification
Yau-Shian Wang | Ta-Chung Chi | Ruohong Zhang | Yiming Yang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yau-Shian Wang | Ta-Chung Chi | Ruohong Zhang | Yiming Yang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We present PESCO, a novel contrastive learning framework that substantially improves the performance of zero-shot text classification. We formulate text classification as a neural text retrieval problem where each document is treated as a query, and the system learns the mapping from each query to the relevant class labels by (1) adding prompts to enhance label retrieval, and (2) using retrieved labels to enrich the training set in a self-training loop of contrastive learning. PESCO achieves state-of-the-art performance on four benchmark text classification datasets. On DBpedia, we achieve 98.5% accuracy without any labeled data, which is close to the fully-supervised result. Extensive experiments and analyses show all the components of PESCO are necessary for improving the performance of zero-shot text classification.