Chong Chen
Other people with similar names: Chong Chen
Unverified author pages with similar names: Chong Chen
2026
GALA: Geometric Data Selection with Strategic Prospecting for Large Language Model Self-training
Zhongwei Xie | Ruihao Liao | Zimo Wang | Chong Chen | Xian-Sheng Hua | Xiao Luo
Findings of the Association for Computational Linguistics: ACL 2026
Zhongwei Xie | Ruihao Liao | Zimo Wang | Chong Chen | Xian-Sheng Hua | Xiao Luo
Findings of the Association for Computational Linguistics: ACL 2026
Self-training has emerged as a promising direction for autonomously improving large language models (LLMs). Existing approaches typically adopt a generate-and-filter paradigm based on rejection sampling, which could suffer from inefficiency and low-quality reasoning paths. Towards this end, this paper proposes a novel framework named ̲Geometric D ̲ata Se ̲lection with Str ̲ategic Prospecting (GALA) for LLM self-training. The core of our GALA is to identify diverse and informative samples from redundant data and exploit them more strategically. In particular, our proposed GALA first conducts clustering on latent sentence embeddings and then selects an anchor sample from each cluster based on the geometric distance to reduce data redundancy. To further exploit these samples, we conduct strategic brainstorming and reflection for high-quality reasoning trajectory prospecting. In addition, we introduce a lightweight dynamic validation module to validate the reliability of mini-batches to ensure the overall quality of the data. Extensive experiments on various benchmarks validate the effectiveness of the proposed GALA against several competing baselines.
2025
LEAF: Large Language Diffusion Model for Time Series Forecasting
Yuhang Pei | Tao Ren | Yifan Wang | Zhipeng Sun | Wei Ju | Chong Chen | Xian-Sheng Hua | Xiao Luo
Findings of the Association for Computational Linguistics: EMNLP 2025
Yuhang Pei | Tao Ren | Yifan Wang | Zhipeng Sun | Wei Ju | Chong Chen | Xian-Sheng Hua | Xiao Luo
Findings of the Association for Computational Linguistics: EMNLP 2025
This paper studies the problem of time series forecasting, which aims to generate future predictions given historical trajectories. Recent researchers have applied large language models (LLMs) into time series forecasting, which usually align the time series space with textual space and output future predictions with strong autoregressive reasoning abilities. Despite their remarkable progress, these approaches usually lack an understanding of holistic temporal patterns with potential error accumulation. Towards this end, this paper proposes a simple yet effective framework that marries ̲Larg ̲e Langu ̲age Diffusion Model with time series ̲forecasting (LEAF). The core of our framework is to generate future predictions with a diffusion model from a holistic view. In particular, we first introduce a tokenization module to convert time series into tokens and then adopt the language diffusion models to capture the temporal dependencies. In this way, we can transform masked time series into all the predictions with the remasking strategy. Extensive experiments on various benchmark datasets validate the effectiveness of the proposed LEAF in comparison to various baselines.