Wooseong Yang
2025
LLMInit: A Free Lunch from Large Language Models for Selective Initialization of Recommendation
Weizhi Zhang
|
Liangwei Yang
|
Wooseong Yang
|
Henry Peng Zou
|
Yuqing Liu
|
Ke Xu
|
Sourav Medya
|
Philip S. Yu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Collaborative filtering (CF) is widely adopted in industrial recommender systems (RecSys) for modeling user-item interactions across numerous applications, but often struggles with cold-start and data-sparse scenarios. Recent advancements in pre-trained large language models (LLMs) with rich semantic knowledge, offer promising solutions to these challenges. However, deploying LLMs at scale is hindered by their significant computational demands and latency. In this paper, we propose a novel and scalable LLM-RecSys framework, LLMInit, designed to integrate pretrained LLM embeddings into CF models through selective initialization strategies. Specifically, we identify the embedding collapse issue observed when CF models scale and match the large embedding sizes in LLMs and avoid the problem by introducing efficient sampling methods, including, random, uniform, and variance-based selections. Comprehensive experiments conducted on multiple real-world datasets demonstrate that LLMInit significantly improves recommendation performance while maintaining low computational costs, offering a practical and scalable solution for industrial applications. To facilitate industry adoption and promote future research, we provide open-source access to our implementation at https://github.com/DavidZWZ/LLMInit.
2023
ConfliBERT-Arabic: A Pre-trained Arabic Language Model for Politics, Conflicts and Violence
Sultan Alsarra
|
Luay Abdeljaber
|
Wooseong Yang
|
Niamat Zawad
|
Latifur Khan
|
Patrick Brandt
|
Javier Osorio
|
Vito D’Orazio
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
This study investigates the use of Natural Language Processing (NLP) methods to analyze politics, conflicts and violence in the Middle East using domain-specific pre-trained language models. We introduce Arabic text and present ConfliBERT-Arabic, a pre-trained language models that can efficiently analyze political, conflict and violence-related texts. Our technique hones a pre-trained model using a corpus of Arabic texts about regional politics and conflicts. Performance of our models is compared to baseline BERT models. Our findings show that the performance of NLP models for Middle Eastern politics and conflict analysis are enhanced by the use of domain-specific pre-trained local language models. This study offers political and conflict analysts, including policymakers, scholars, and practitioners new approaches and tools for deciphering the intricate dynamics of local politics and conflicts directly in Arabic.
Search
Fix author
Co-authors
- Luay Abdeljaber 1
- Sultan Alsarra 1
- Patrick Brandt 1
- Vito D’Orazio 1
- Latifur Khan 1
- show all...