Yunxin Liu
2025
Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement
Maosong Cao | Taolin Zhang | Mo Li | Chuyu Zhang | Yunxin Liu | Haodong Duan | Songyang Zhang | Kai Chen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Maosong Cao | Taolin Zhang | Mo Li | Chuyu Zhang | Yunxin Liu | Haodong Duan | Songyang Zhang | Kai Chen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The quality of Supervised Fine-Tuning (SFT) data plays a critical role in enhancing the conversational capabilities of Large Language Models (LLMs). However, the availability of high-quality human-annotated SFT data has become a significant bottleneck for LLMs, necessitating a greater reliance on synthetic training data. In this work, we introduce Condor, a two-stage synthetic data generation framework that incorporates World Knowledge Trees and Self-Reflection Refinement to produce high-quality SFT data at scale. Our experimental results demonstrate that a base model fine-tuned on only 20K Condor-generated samples achieves superior performance compared to instruct model trained with RLHF. The additional refinement stage in Condor further enables iterative self-improvement for LLMs at various scales (up to 72B), validating the effectiveness of our approach. Furthermore, our investigation into the scaling of synthetic data in post-training reveals substantial unexplored potential for performance improvements, opening promising avenues for future research.
An Empirical Study of LLM Reasoning Ability Under Strict Output Length Constraint
Yi Sun | Han Wang | Jiaqiang Li | Jiacheng Liu | Xiangyu Li | Hao Wen | Yizhen Yuan | Huiwen Zheng | Yan Liang | Yuanchun Li | Yunxin Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yi Sun | Han Wang | Jiaqiang Li | Jiacheng Liu | Xiangyu Li | Hao Wen | Yizhen Yuan | Huiwen Zheng | Yan Liang | Yuanchun Li | Yunxin Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Recent work has demonstrated the remarkable potential of Large Language Models (LLMs) in test-time scaling. By making models think before answering, they are able to achieve much higher accuracy with extra inference computation.However, in many real-world scenarios, models are used under time constraints, where an answer should be given within a certain output length. It is unclear whether and how the reasoning ability of different LLMs remain effective under strict constraints.We take a first look at this problem by conducting an in-depth empirical study. Specifically, we test 30 LLMs on common reasoning datasets under a wide range of output length budgets, and we analyze the correlation between the inference accuracy and various properties including model type, model size, prompt style, etc. We also consider the mappings between token budgets and actual on-device latency budgets.The results have demonstrated several interesting findings regarding the budget-aware LLM reasoning ability that differ from the unconstrained situation, e.g. the optimal choices of either model size or prompt style change under different budgets. These findings offer timely evaluation to this area and practical guidance for users to deploy LLMs under real-world latency constraints.
2024
SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget
Rui Kong | Yuanchun Li | Qingtian Feng | Weijun Wang | Xiaozhou Ye | Ye Ouyang | Linghe Kong | Yunxin Liu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Rui Kong | Yuanchun Li | Qingtian Feng | Weijun Wang | Xiaozhou Ye | Ye Ouyang | Linghe Kong | Yunxin Liu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Mixture of experts (MoE) is a popular technique to improve capacity of Large Language Models (LLMs) with conditionally-activated parallel experts. However, serving MoE models on memory-constrained devices is challenging due to the large parameter size. Typical solutions such as memory swapping or expert pruning may lead to significantly higher latency or severe accuracy loss.In this paper, we introduce SwapMoE, a framework for efficient serving of MoE-based large language models with tunable memory budgets. The main idea of SwapMoE is to keep a small dynamic set of important experts, namely Virtual Experts, in the main memory for inference, while seamlessly maintaining how the Virtual Experts map to the actual experts. Experiments have shown that SwapMoE can reduce the memory footprint while maintaining reasonable accuracy. For example, on text summarization tasks with Switch Transformer, SwapMoE can reduce the memory consumption from 14.2 GiB to 4.7 GiB, together with 50% latency reduction and a slight Rouge-2 score drop of 0.041.
2023
FedTherapist: Mental Health Monitoring with User-Generated Linguistic Expressions on Smartphones via Federated Learning
Jaemin Shin | Hyungjun Yoon | Seungjoo Lee | Sungjoon Park | Yunxin Liu | Jinho Choi | Sung-Ju Lee
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Jaemin Shin | Hyungjun Yoon | Seungjoo Lee | Sungjoon Park | Yunxin Liu | Jinho Choi | Sung-Ju Lee
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Psychiatrists diagnose mental disorders via the linguistic use of patients. Still, due to data privacy, existing passive mental health monitoring systems use alternative features such as activity, app usage, and location via mobile devices. We propose FedTherapist, a mobile mental health monitoring system that utilizes continuous speech and keyboard input in a privacy-preserving way via federated learning. We explore multiple model designs by comparing their performance and overhead for FedTherapist to overcome the complex nature of on-device language model training on smartphones. We further propose a Context-Aware Language Learning (CALL) methodology to effectively utilize smartphones’ large and noisy text for mental health signal sensing. Our IRB-approved evaluation of the prediction of self-reported depression, stress, anxiety, and mood from 46 participants shows higher accuracy of FedTherapist compared with the performance with non-language features, achieving 0.15 AUROC improvement and 8.21% MAE reduction.
Search
Fix author
Co-authors
- Yuanchun Li 2
- Maosong Cao 1
- Kai Chen 1
- Jinho D. Choi 1
- Haodong Duan 1
- Qingtian Feng 1
- Rui Kong 1
- Linghe Kong 1
- Seungjoo Lee 1
- Sung-Ju Lee 1
- Mo Li 1
- Jiaqiang Li 1
- Xiangyu Li 1
- Yan Liang 1
- Jiacheng Liu 1
- Ye Ouyang 1
- Sungjoon Park 1
- Jaemin Shin 1
- Yi Sun 1
- Weijun Wang 1
- Han Wang (王涵) 1
- Hao Wen 1
- Xiaozhou Ye 1
- Hyungjun Yoon 1
- Yizhen Yuan 1
- Taolin Zhang 1
- Chuyu Zhang 1
- Songyang Zhang 1
- Huiwen Zheng 1