Chao Huang

2025

Modern recommender systems aim to deeply understand users’ complex preferences through their past interactions. While deep collaborative filtering approaches using Graph Neural Networks (GNNs) excel at capturing user-item relationships, their effectiveness is limited when handling sparse data or zero-shot scenarios, primarily due to constraints in ID-based embedding functions. To address these challenges, we propose a model-agnostic recommendation instruction-tuning paradigm that seamlessly integrates large language models with collaborative filtering. Our proposed Recommendation Language Model (RecLM) enhances the capture of user preference diversity through a carefully designed reinforcement learning reward function that facilitates self-augmentation of language models. Comprehensive evaluations demonstrate significant advantages of our approach across various settings, and its plug-and-play compatibility with state-of-the-art recommender systems results in notable performance enhancements.

The softmax function is crucial in Transformer attention, which normalizes each row of the attention scores with summation to one. **Usually, tokens with larger attention scores are important for the final prediction.However, the softmax function can face a gradient vanishing issue for such important tokens (e.g., probabilities close to one), leading to optimization difficulties for the important tokens so that the performance may not be better.**In this paper, we propose Self-Adjust Softmax (SA-Softmax) to address this issue by modifying softmax(z) to z ⋅ softmax(z) and its normalized variant ^{(z - min(z_\min,0))}⁄_{max(0,z_max)-min(z_min,0)} ⋅ softmax(z).We theoretically show that SA-Softmax provides enhanced gradient properties compared to the vanilla softmax function.Moreover, Attention can be seamlessly integrated into existing Transformer models to their attention mechanisms with minor adjustments.We conducted experiments to evaluate the empirical performance of Transformer models using compared to the vanilla softmax function. These experiments, involving models with up to 2.7 billion parameters, are conducted across diverse datasets, language tasks, and positional encoding methods.

This work addresses a fundamental barrier in recommender systems: the inability to generalize across domains without extensive retraining. Traditional ID-based approaches fail entirely in cold-start and cross-domain scenarios where new users or items lack sufficient interaction history. Inspired by foundation models’ cross-domain success, we develop a foundation model for sequential recommendation that achieves genuine zero-shot generalization capabilities. Our approach fundamentally departs from existing ID-based methods by deriving item representations exclusively from textual features. This enables immediate embedding of any new item without model retraining. We introduce unified item tokenization with Finite Scalar Quantization that transforms heterogeneous textual descriptions into standardized discrete tokens. This eliminates domain barriers that plague existing systems. Additionally, the framework features hybrid bidirectional-causal attention that captures both intra-item token coherence and inter-item sequential dependencies. An efficient catalog-aware beam search decoder enables real-time token-to-item mapping. Unlike conventional approaches confined to their training domains, RecGPT naturally bridges diverse recommendation contexts through its domain-invariant tokenization mechanism. Comprehensive evaluations across six datasets and industrial scenarios demonstrate consistent performance advantages.

Multilingual dense retrieval aims to retrieve relevant documents across different languages based on a unified retriever model. The challenge lies in aligning representations of different languages in a shared vector space. The common practice is to fine-tune the dense retriever via contrastive learning, whose effectiveness highly relies on the quality of the negative sample and the efficacy of mini-batch data. Different from the existing studies that focus on developing sophisticated model architecture, we propose a method to boost data utilization for multilingual dense retrieval by obtaining high-quality hard negative samples and effective mini-batch data. The extensive experimental results on a multilingual retrieval benchmark, MIRACL, with 16 languages demonstrate the effectiveness of our method by outperforming several existing strong baselines.

pdf bib abs
EasyRec: Simple yet Effective Language Models for Recommendation
Xubin Ren | Chao Huang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Deep neural networks have emerged as a powerful technique for learning representations from user-item interaction data in collaborative filtering (CF) for recommender systems. However, many existing methods heavily rely on unique user and item IDs, which restricts their performance in zero-shot learning scenarios. Inspired by the success of language models (LMs) and their robust generalization capabilities, we pose the question: How can we leverage language models to enhance recommender systems? We propose EasyRec, an effective approach that integrates text-based semantic understanding with collaborative signals. EasyRec employs a text-behavior alignment framework that combines contrastive learning with collaborative language model tuning. This ensures strong alignment between text-enhanced semantic representations and collaborative behavior information. Extensive evaluations across diverse datasets show EasyRec significantly outperforms state-of-the-art models, particularly in text-based zero-shot recommendation. EasyRec functions as a plug-and-play component that integrates seamlessly into collaborative filtering frameworks. This empowers existing systems with improved performance and adaptability to user preferences. Implementation codes are publicly available at: https://github.com/HKUDS/EasyRec

Real-world data combines structured (e.g., graph connections) and unstructured (e.g., text, visuals) formats, capturing explicit relationships (e.g., social links) and implicit semantic interdependencies (e.g., knowledge graphs). We propose GraphAgent, an automated agent pipeline addressing both explicit and implicit graph-enhanced semantic dependencies for predictive (e.g., node classification) and generative (e.g., text generation) tasks. GraphAgent integrates three components: (i) a Graph Generator Agent creating knowledge graphs for semantic dependencies; (ii) a Task Planning Agent interpreting user queries and formulating tasks via self-planning; and (iii) a Task Execution Agent automating task execution with tool matching. These agents combine language and graph language models to reveal complex relational and semantic patterns. Extensive experiments on diverse datasets validate GraphAgent’s effectiveness in graph-related predictive and text generative tasks. GraphAgent is open-sourced at: https://anonymous.4open.science/r/GraphAgent-Submit-6F52/.

In recent years, large language models (LLMs) have made significant progress in knowledge-intensive applications. However, when adapting them to specific domains, we may encounter a multi-stage continuous learning scenario, especially in cases where domain knowledge evolves rapidly.This issue severely limits traditional fine-tuning approaches for LLMs.To overcome this limitation, we propose a new learning paradigm designed specifically for multi-stage continuous learning. This paradigm includes a preference-based learning bias to identify potential knowledge conflicts, as well as a self-distillation-based data augmentation strategy to expand and enrich the training corpus, thereby improving the integration of knowledge-compatible information.In the experiments, we show that our proposed method achieves a significant improvement in accuracy after 7 stages of fine-tuning compared to previous methods, while also demonstrating excellent performance in preserving general knowledge.We have released our code and dataset at Multi-Stage-Learning.

The recent advancement of Multimodal Large Language Models (MLLMs) has significantly improved their fine-grained perception of single images and general comprehension across multiple images. However, existing MLLMs still face challenges in achieving precise grounding in complex multi-image scenarios. To address this, we first explore a Chain-of-Thought (CoT) framework that integrates single-image grounding with multi-image comprehension. While partially effective, it remains unstable and struggles to capture abstract visual information due to its non-end-to-end nature. Therefore, we introduce Migician, the first multi-image grounding model capable of performing free-form and accurate grounding across multiple images. To support this, we present the MGrounding-630k dataset, which comprises data for several multi-image grounding tasks derived from existing datasets, along with newly generated free-form grounding instruction-following data. Furthermore, we propose MIG-Bench, a comprehensive benchmark specifically designed for evaluating multi-image grounding capabilities. Experimental results demonstrate that our model achieves significantly superior multi-image grounding capabilities, outperforming the best existing MLLMs by 24.94% and even surpassing much larger 70B models. Our code, model, dataset, and benchmark are fully open-sourced at https://migician-vg.github.io/.

Digital agents for automating tasks across different platforms by directly manipulating the GUIs are increasingly important. For these agents, grounding from language instructions to target elements remains a significant challenge due to reliance on HTML or AXTree inputs. In this paper, we introduce Aria-UI, a large multimodal model specifically designed for GUI grounding. Aria-UI adopts a pure-vision approach, eschewing reliance on auxiliary inputs. To adapt to heterogeneous planning instructions, we propose a scalable data pipeline that synthesizes diverse and high-quality instruction samples for grounding. To handle dynamic contexts in task performing, Aria-UI incorporates textual and text-image interleaved action histories, enabling robust context-aware reasoning for grounding. Aria-UI sets new state-of-the-art results across offline and online agent benchmarks, outperforming both vision-only and AXTree-reliant baselines. We release all training data and model checkpoints to foster further research.

pdf bib abs
LightRAG: Simple and Fast Retrieval-Augmented Generation
Zirui Guo | Lianghao Xia | Yanhua Yu | Tu Ao | Chao Huang
Findings of the Association for Computational Linguistics: EMNLP 2025

Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user needs. However, existing RAG systems have significant limitations, including reliance on flat data representations and inadequate contextual awareness, which can lead to fragmented answers that fail to capture complex interdependencies. To address these challenges, we propose LightRAG, a novel framework that incorporates graph structures into text indexing and retrieval processes. This innovative approach employs a dual-level retrieval system that enhances comprehensive information retrieval from both low- and high-level knowledge discovery. Additionally, the integration of graph structures with vector representations facilitates efficient retrieval of related entities and their relationships, significantly improving response times while maintaining contextual relevance. This capability is further enhanced by an incremental update algorithm that ensures the timely integration of new data, allowing the system to remain effective and responsive in rapidly changing data environments. Extensive experimental validation demonstrates considerable improvements in retrieval accuracy and efficiency compared to existing approaches. We have made our LightRAG framework open source and anonymously available at the link: https://anonymous.4open.science/r/LightRAG-2BEE.

2024

“This paper introduces a novel crowdsourcing worker selection algorithm, enhancing annotationquality and reducing costs. Unlike previous studies targeting simpler tasks, this study con-tends with the complexities of label interdependencies in sequence labeling. The proposedalgorithm utilizes a Combinatorial Multi-Armed Bandit (CMAB) approach for worker selec-tion, and a cost-effective human feedback mechanism. The challenge of dealing with imbal-anced and small-scale datasets, which hinders offline simulation of worker selection, is tack-led using an innovative data augmentation method termed shifting, expanding, and shrink-ing (SES). Rigorous testing on CoNLL 2003 NER and Chinese OEI datasets showcased thealgorithm’s efficiency, with an increase in F1 score up to 100.04% of the expert-only base-line, alongside cost savings up to 65.97%. The paper also encompasses a dataset-independenttest emulating annotation evaluation through a Bernoulli distribution, which still led to animpressive 97.56% F1 score of the expert baseline and 59.88% cost savings. Furthermore,our approach can be seamlessly integrated into Reinforcement Learning from Human Feed-back (RLHF) systems, offering a cost-effective solution for obtaining human feedback. All re-sources, including source code and datasets, are available to the broader research community athttps://github.com/blcuicall/nlp-crowdsourcing.”

pdf bib abs
XRec: Large Language Models for Explainable Recommendation
Qiyao Ma | Xubin Ren | Chao Huang
Findings of the Association for Computational Linguistics: EMNLP 2024

Recommender systems help users navigate information overload by providing personalized recommendations aligned with their preferences. Collaborative Filtering (CF) is a widely adopted approach, but while advanced techniques like graph neural networks (GNNs) and self-supervised learning (SSL) have enhanced CF models for better user representations, they often lack the ability to provide explanations for the recommended items. Explainable recommendations aim to address this gap by offering transparency and insights into the recommendation decision-making process, enhancing users’ understanding. This work leverages the language capabilities of Large Language Models (LLMs) to push the boundaries of explainable recommender systems. We introduce a model-agnostic framework called XRec, which enables LLMs to provide comprehensive explanations for user behaviors in recommender systems. By integrating collaborative signals and designing a lightweight collaborative adaptor, the framework empowers LLMs to understand complex patterns in user-item interactions and gain a deeper understanding of user preferences. Our extensive experiments demonstrate the effectiveness of XRec, showcasing its ability to generate comprehensive and meaningful explanations that outperform baseline approaches in explainable recommender systems.

pdf bib abs
OpenGraph: Towards Open Graph Foundation Models
Lianghao Xia | Ben Kao | Chao Huang
Findings of the Association for Computational Linguistics: EMNLP 2024

Graph learning has become essential in various domains, including recommendation systems and social network analysis. Graph Neural Networks (GNNs) have emerged as promising techniques for encoding structural information and improving performance in tasks like link prediction and node classification. However, a key challenge remains: the difficulty of generalizing to unseen graph data with different properties. In this work, we propose a novel graph foundation model, called OpenGraph, to address this challenge. Our approach tackles several technical obstacles. Firstly, we enhance data augmentation using a large language model (LLM) to overcome data scarcity in real-world scenarios. Secondly, we introduce a unified graph tokenizer that enables the model to generalize effectively to diverse graph data, even when encountering unseen properties during training. Thirdly, our developed scalable graph transformer captures node-wise dependencies within the global topological context. Extensive experiments validate the effectiveness of our framework. By adapting OpenGraph to new graph characteristics and comprehending diverse graphs, our approach achieves remarkable zero-shot graph learning performance across various settings. We release the model implementation at https://github.com/HKUDS/OpenGraph.

Co-authors

Venues

Fix author