Zhichen Liu
2026
Think in Sentences: Explicit Sentence Boundaries Enhance Language Model’s Capabilities
Zhichen Liu | Yongyuan Li | Yang Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhichen Liu | Yongyuan Li | Yang Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Researchers have explored ways to improve large language models (LLMs)’ capabilities via dummy token insertion in contexts. However, existing works focus solely on the dummy tokens themselves, but failed to leverage the inherent sentence-level structure of natural language. This is a critical oversight, as LLMs acquire linguistic capabilities through exposure to human-generated texts, which are inherently structured at the sentence level. Motivated by the gap, we proposed a method that inserts delimiters at sentence boundaries. Our method not only integrates dummy tokens into contexts, but also enables LLMs with sentence-by-sentence processing behavior during reasoning. Two approaches are proposed: (1). In-context learning and (2). Supervised fine-tuning are experimented from 7B LLMs to 600B Deepseek-V3. Experimental results demonstrate consistent improvements in various tasks, with notable gains of up to 7.7% on GSM8k and 12.5% on DROP. Furthermore, LLMs fine-tuned via our strategy further incorporate sentence awareness into their inner representations. Our work establishes a simple yet effective technique for enhancing LLM’s capabilities, offering promising directions for cognitive-inspired LLM enhancement paradigm.
2025
Evaluating Text Generation Quality Using Spectral Distances of Surprisal
Zhichen Liu | Yongyuan Li | Yang Xu | Yu Wang | Yingfang Yuan | Zuhao Yang
Findings of the Association for Computational Linguistics: EMNLP 2025
Zhichen Liu | Yongyuan Li | Yang Xu | Yu Wang | Yingfang Yuan | Zuhao Yang
Findings of the Association for Computational Linguistics: EMNLP 2025
We propose a novel automatic evaluation metric for open-ended text generation, which is a substantial improvement of the recently developed method, Fourier analysis of cross-entropy (FACE), hence, FACE-2. FACE-2 is a psycholinguistically inspired metric that extracts the dynamic patterns (spectrum) of text surprisal. Examined with open-ended text generation tasks, FACE-2 significantly outperforms a broad set of baseline metrics in revealing the model scaling effect, which scales up to models of 70B parameters, while many other existing metrics fail to capture this effect. We have also confirmed the advantage of FACE-2 in producing stronger agreement with human preferences from a large human-annotated dataset. We advocate for including metrics that mine the dynamics of likelihood in evaluating open-ended text generation, which covers broader aspects of human language than only using static likelihood-based or semantic-based metrics. Code repository: https://github.com/CLCS-SUSTech/FACEScore.
2024
Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood
Yang Xu | Yu Wang | Hao An | Zhichen Liu | Yongyuan Li
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Yang Xu | Yu Wang | Hao An | Zhichen Liu | Yongyuan Li
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Human and model-generated texts can be distinguished by examining the magnitude of likelihood in language. However, it is becoming increasingly difficult as language model’s capabilities of generating human-like texts keep evolving. This study provides a new perspective by using the relative likelihood values instead of absolute ones, and extracting useful features from the spectrum-view of likelihood for the human-model text detection task. We propose a detection procedure with two classification methods, supervised and heuristic-based, respectively, which results in competitive performances with previous zero-shot detection methods and a new state-of-the-art on short-text detection. Our method can also reveal subtle differences between human and model languages, which find theoretical roots in psycholinguistics studies.