Ziyong Lin
2025
Look Both Ways and No Sink: Converting LLMs into Text Encoders without Training
Ziyong Lin
|
Haoyi Wu
|
Shu Wang
|
Kewei Tu
|
Zilong Zheng
|
Zixia Jia
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent advancements have demonstrated the advantage of converting pretrained large language models into powerful text encoders by enabling bidirectional attention in transformer layers. However, existing methods often require extensive training on large-scale datasets, posing challenges in low-resource, domain-specific scenarios. In this work, we show that a pretrained large language model can be converted into a strong text encoder without additional training. We first conduct a comprehensive empirical study to investigate different conversion strategies and identify the impact of the attention sink phenomenon on the performance of converted encoder models. Based on our findings, we propose a novel approach that enables bidirectional attention and suppresses the attention sink phenomenon, resulting in superior performance. Extensive experiments on multiple domains demonstrate the effectiveness of our approach. Our work provides new insights into the training-free conversion of text encoders in low-resource scenarios and contributes to the advancement of domain-specific text representation generation. Our code is available at https://github.com/bigai-nlco/Look-Both-Ways-and-No-Sink.
2024
Varying Sentence Representations via Condition-Specified Routers
Ziyong Lin
|
Quansen Wang
|
Zixia Jia
|
Zilong Zheng
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Semantic similarity between two sentences is inherently subjective and can vary significantly based on the specific aspects emphasized. Consequently, traditional sentence encoders must be capable of generating conditioned sentence representations that account for diverse conditions or aspects. In this paper, we propose a novel yet efficient framework based on transformer-style language models that facilitates advanced conditioned sentence representation while maintaining model parameters and computational efficiency. Empirical evaluations on the Conditional Semantic Textual Similarity and Knowledge Graph Completion tasks demonstrate the superiority of our proposed framework.
Search
Fix author
Co-authors
- Zixia Jia 2
- Zilong Zheng 2
- Kewei Tu 1
- Quansen Wang 1
- Shu Wang 1
- show all...
- Haoyi Wu 1