Yaqin Chen

2026

In the realm of domain-specific natural language understanding (NLU) tasks, acquiring high-quality labeled data is often arduous, thereby posing significant challenges for effective model training. Multi-task learning (MTL) addresses these limitations by jointly optimizing multiple tasks within a unified framework. In this paper, we introduce a novel sparse NLU multi-task learning framework that decomposes the language model into modular skill components and employs a dynamic, learnable skill-combination mechanism to adaptively handle diverse tasks. Extensive experiments on benchmark NLU datasets demonstrate that our proposed method surpasses conventional multi-task learning approaches in performance.

pdf bib abs

Memory serves as a pivotal component in interactive response generation, supplying essential background information and referential knowledge for dialogues. Conventional interactive algorithms have predominantly treated memory as a merely contextual element, largely neglecting the nuanced cognitive processes involved in individualized memory encoding and retrieval. This conceptual gap has led to the prevailing schema where memory-enhanced dialogue datasets incorporate monolithic, undifferentiated memory content, failing to capture the personalized nature of persoa memory processing. Grounded in the self-reference effect from cognitive psychology, we introduce a Multi-Turn Dialogue Dataset with Personalized Contextual Memory (), establishing a comprehensive benchmark to facilitate advanced research on personalized memory processing algorithms.

2025

pdf bib abs

Palette of Language Models: A Solver for Controlled Text Generation
Zhe Yang | Yi Huang | Yaqin Chen | Xiaoting Wu | Junlan Feng | Chao Deng
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Recent advancements in large language models have revolutionized text generation with their remarkable capabilities. These models can produce controlled texts that closely adhere to specific requirements when prompted appropriately. However, designing an optimal prompt to control multiple attributes simultaneously can be challenging. A common approach is to linearly combine single-attribute models, but this strategy often overlooks attribute overlaps and can lead to conflicts. Therefore, we propose a novel combination strategy inspired by the Law of Total Probability and Conditional Mutual Information Minimization on generative language models. This method has been adapted for single-attribute control scenario and is termed the Palette of Language Models due to its theoretical linkage between attribute strength and generation style, akin to blending colors on an artist’s palette. Moreover, positive correlation and attribute enhancement are advanced as theoretical properties to guide a rational combination strategy design. We conduct experiments on both single control and multiple control settings, and achieve surpassing results.

2024

pdf bib abs

In extremely low resource relation identification scenario, small language models (SLMs) incline to overfit, which significantly diminishes their accuracy. Recently, large language models (LLMs) are gradually applied to classification tasks with converting original objective into the generation task via in-context learning. However, abundance of the classifier categories poses challenges in selecting demonstrations. Moreover, the mapping between category labels and textual descriptions requires expensive expert knowledge, thereby constraining the efficacy of in-context learning for LLMs. We uphold that SLM is optimal for handling classification tasks, and its shortcomings in the low resource setting can be mitigated by leveraging LLM. Hence, we propose a co-evolution strategy on SLM & LLM for relation identification. Specifically, LLM provides essential background knowledge to assist training process of the SLM classifier, while evaluation metrics from the classifier, in turn, offer valuable insights to refine the generation prompts of the LLM. We conduct experiments on several datasets which demonstrates preponderance of the proposed model.

Co-authors

Venues

Findings3
NAACL1

Fix author