This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
LiangyueLi
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
To advance personalized applications such as recommendation systems and user behavior prediction, recent research increasingly adopts large language models (LLMs) for human-readable persona modeling. In dynamic real-world scenarios, effective persona modeling necessitates leveraging streaming behavior data to continually optimize user personas.However, existing methods—whether regenerating personas or incrementally extending them with new behaviors—often fail to achieve sustained improvements in persona quality or future behavior prediction accuracy. To address this, we propose DEEPER, a novel approach for dynamic persona modeling that enables continual persona optimization. Specifically, we enhance the model’s direction-search capability through an iterative reinforcement learning framework, allowing it to automatically identify effective update directions and optimize personas using discrepancies between user behaviors and model predictions.Extensive experiments on dynamic persona modeling involving 4,800 users across 10 domains highlight ’s superior persona optimization capabilities, delivering an impressive 32.2% average reduction in user behavior prediction error over four update rounds—outperforming the best baseline by a remarkable 22.92%.
As large language models (LLMs) have demonstrated impressive multiple step-by-step reasoning capabilities in recent natural language processing (NLP) reasoning tasks, many studies are interested in distilling reasoning abilities into smaller language models (SLMs) via fine-tuning. Previous distillation methods usually utilize the capabilities of LLMs to generate chain-of-thought (CoT) samples to teach SLMs. However, this distillation approach performs poorly in certain scenarios due to the limitations of CoT. In this work, we introduce a novel Mixed Distillation (MD) framework, distilling multiple step-by-step reasoning abilities into SLMs. First, we leverage LLMs to generate multiple step-by-step reasoning rationales by sampling automatically. Then, we create high-quality, well-balanced mixed thought data and design a novel multi-task loss to help SLMs better learn and adaptively activate multiple step-by-step reasoning. Our extensive experiments demonstrate that MD enhances both single-path (using either CoT or PoT) and multi-path (using both CoT and PoT) reasoning abilities of SLMs during inference across reasoning tasks. Notably, a single model generated by MD exceeds the comprehensive performance of an ensemble of two individual CoT and PoT distilled models. Mistral-7B using MD can achieve remarkable improvements of 87.5%, 74.0% and 77.1% on SVAMP, GSM8K and ASDIV, respectively, outperforming the teacher model, GPT-3.5-Turbo. We hope our work provides insight into SLMs’ multiple step-by-step reasoning abilities.
Hierarchical text classification (HTC) is an important task with broad applications, and few-shot HTC has gained increasing interest recently. While in-context learning (ICL) with large language models (LLMs) has achieved significant success in few-shot learning, it is not as effective for HTC because of the expansive hierarchical label sets and extremely ambiguous labels. In this work, we introduce the first ICL-based framework with LLM for few-shot HTC. We exploit a retrieval database to identify relevant demonstrations, and an iterative policy to manage multi-layer hierarchical labels. Particularly, we equip the retrieval database with HTC label-aware representations for the input texts, which is achieved by continual training on a pretrained language model with masked language modeling (MLM), layer-wise classification (CLS, specifically for HTC), and a novel divergent contrastive learning (DCL, mainly for adjacent semantically similar labels) objective. Experimental results on three benchmark datasets demonstrate superior performance of our method, and we can achieve state-of-the-art results in few-shot HTC.