Yang Xu
Other people with similar names: Yang Xu
Unverified author pages with similar names: Yang Xu
2026
Eye Movement Features Can Predict Human Preferences on Machine-Generated Texts
Xiaoshan He | Xiaoqun Liu | Haodong He | Yu Wang | Yang Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Xiaoshan He | Xiaoqun Liu | Haodong He | Yu Wang | Yang Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Eye movement offers valuable insights into human visual attention during assessment of machine-generated texts, yet existing research and resources in this area are limited. To bridge this gap, we introduce Gaze Responses for Evaluating AI Texts (GREAT), a comprehensive dataset capturing human eye-movement features during screen reading of passages generated by large language models (LLMs). The dataset includes raw eye-movement recordings, reading-time measurements, and post-reading evaluations for LLM-generated passage pairs, alongside rigorous validation metrics. The collected eye-movement features demonstrate strong explanatory power in predicting text quality. When integrated with negative log-likelihood (NLL), a commonly used metric for evaluating text quality, it substantially enhances model performance across all standard statistical criteria. These findings demonstrate that eye-movement can act as an effective source of information that complements probabilistic metrics, for the task of automatic text quality assessment. The full dataset and some processing code are publicly available at https://github.com/qwurd231/GREAT.
Identifying the Periodicity of Information in Natural Language
Yulin OU | Yu Wang | Yang Xu | Hendrik Buschmeier
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yulin OU | Yu Wang | Yang Xu | Hendrik Buschmeier
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent theoretical advancement of information density in natural language has brought the following question on desk: To what degree does natural language exhibit periodicity pattern in its encoded information? We address this question by introducing a new method called AutoPeriod of Surprisal (APS). APS adopts a canonical periodicity detection algorithm and is able to identify any significant periods that exist in the surprisal sequence of a single document. By applying the algorithm to a set of corpora, we have obtained the following interesting results: Firstly, a considerable proportion of human language demonstrates a strong pattern of periodicity in information; Secondly, new periods that are outside the distributions of typical structural units in text (e.g., sentence boundaries, elementary discourse units, etc.) are found and further confirmed via harmonic regression modeling. We conclude that the periodicity of information in language is a joint outcome from both structured factors and other driving factors that take effect at longer distances. The advantages of our periodicity detection method and its potentials in LLM-generation detection are further discussed.
Investigating the Representation of Backchannels and Fillers in Fine-tuned Language Models
Yu Wang | Leyi Lao | Langchu Huang | Gabriel Skantze | Yang Xu | Hendrik Buschmeier
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yu Wang | Leyi Lao | Langchu Huang | Gabriel Skantze | Yang Xu | Hendrik Buschmeier
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Backchannels and fillers are important linguistic expressions in dialogue, but often treated as "noise" to be bypassed in modern transformer-based language models. Our work studies the representation of them in language models using three fine-tuning strategies. The models are trained on three dialogue corpora in English and Japanese, where backchannels and fillers are preserved and annotated, to investigate how fine-tuning can help LMs learn their representations. We first apply clustering analysis to the learnt representation of backchannels and fillers, and have found increased silhouette scores in representations from fine-tuned models, which suggests that fine-tuning enables LMs to distinguish the nuanced semantic variation in different backchannel and filler use. We also use natural language generation (NLG) metrics and qualitative analysis to confirm that the utterances generated by fine-tuned language models resemble human-produced utterances more closely. Our findings suggest the potentials of transforming general LMs into conversational LMs that are more capable of producing human-like languages adequately.
Think in Sentences: Explicit Sentence Boundaries Enhance Language Model’s Capabilities
Zhichen Liu | Yongyuan Li | Yang Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhichen Liu | Yongyuan Li | Yang Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Researchers have explored ways to improve large language models (LLMs)’ capabilities via dummy token insertion in contexts. However, existing works focus solely on the dummy tokens themselves, but failed to leverage the inherent sentence-level structure of natural language. This is a critical oversight, as LLMs acquire linguistic capabilities through exposure to human-generated texts, which are inherently structured at the sentence level. Motivated by the gap, we proposed a method that inserts delimiters at sentence boundaries. Our method not only integrates dummy tokens into contexts, but also enables LLMs with sentence-by-sentence processing behavior during reasoning. Two approaches are proposed: (1). In-context learning and (2). Supervised fine-tuning are experimented from 7B LLMs to 600B Deepseek-V3. Experimental results demonstrate consistent improvements in various tasks, with notable gains of up to 7.7% on GSM8k and 12.5% on DROP. Furthermore, LLMs fine-tuned via our strategy further incorporate sentence awareness into their inner representations. Our work establishes a simple yet effective technique for enhancing LLM’s capabilities, offering promising directions for cognitive-inspired LLM enhancement paradigm.
2025
Evaluating Text Generation Quality Using Spectral Distances of Surprisal
Zhichen Liu | Yongyuan Li | Yang Xu | Yu Wang | Yingfang Yuan | Zuhao Yang
Findings of the Association for Computational Linguistics: EMNLP 2025
Zhichen Liu | Yongyuan Li | Yang Xu | Yu Wang | Yingfang Yuan | Zuhao Yang
Findings of the Association for Computational Linguistics: EMNLP 2025
We propose a novel automatic evaluation metric for open-ended text generation, which is a substantial improvement of the recently developed method, Fourier analysis of cross-entropy (FACE), hence, FACE-2. FACE-2 is a psycholinguistically inspired metric that extracts the dynamic patterns (spectrum) of text surprisal. Examined with open-ended text generation tasks, FACE-2 significantly outperforms a broad set of baseline metrics in revealing the model scaling effect, which scales up to models of 70B parameters, while many other existing metrics fail to capture this effect. We have also confirmed the advantage of FACE-2 in producing stronger agreement with human preferences from a large human-annotated dataset. We advocate for including metrics that mine the dynamics of likelihood in evaluating open-ended text generation, which covers broader aspects of human language than only using static likelihood-based or semantic-based metrics. Code repository: https://github.com/CLCS-SUSTech/FACEScore.
Reasoning for Translation: Comparative Analysis of Chain-of-Thought and Tree-of-Thought Prompting for LLM Translation
Lam Nguyen | Yang Xu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Lam Nguyen | Yang Xu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
As Large Language Models (LLMs) continue to advance in capability, prompt engineering has emerged as a crucial method for optimizing their performance on specialized tasks. While prompting strategies like Zero-shot, Few-shot, Chain-of-Thought, and Tree-of-Thought have demonstrated significant improvements in reasoning tasks, their application to machine translation has received comparatively less attention. This paper systematically evaluates these prompting techniques across diverse language pairs and domains, measuring their effect on translation quality. Our findings reveal substantial performance variations between prompting methods, with certain strategies offering consistent improvements for specific language directions and complexity levels. These results provide valuable insights for developing more effective LLM-based translation systems without requiring model fine-tuning and complement existing works in the field.
2024
How Much Does Nonverbal Communication Conform to Entropy Rate Constancy?: A Case Study on Listener Gaze in Interaction
Yu Wang | Yang Xu | Gabriel Skantze | Hendrik Buschmeier
Findings of the Association for Computational Linguistics: ACL 2024
Yu Wang | Yang Xu | Gabriel Skantze | Hendrik Buschmeier
Findings of the Association for Computational Linguistics: ACL 2024
According to the Entropy Rate Constancy (ERC) principle, the information density of a text is approximately constant over its length. Whether this principle also applies to nonverbal communication signals is still under investigation. We perform empirical analyses of video-recorded dialogue data and investigate whether listener gaze, as an important nonverbal communication signal, adheres to the ERC principle. Results show (1) that the ERC principle holds for listener gaze; and (2) that the two linguistic factors syntactic complexity and turn transition potential are weakly correlated with local entropy of listener gaze.
Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood
Yang Xu | Yu Wang | Hao An | Zhichen Liu | Yongyuan Li
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Yang Xu | Yu Wang | Hao An | Zhichen Liu | Yongyuan Li
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Human and model-generated texts can be distinguished by examining the magnitude of likelihood in language. However, it is becoming increasingly difficult as language model’s capabilities of generating human-like texts keep evolving. This study provides a new perspective by using the relative likelihood values instead of absolute ones, and extracting useful features from the spectrum-view of likelihood for the human-model text detection task. We propose a detection procedure with two classification methods, supervised and heuristic-based, respectively, which results in competitive performances with previous zero-shot detection methods and a new state-of-the-art on short-text detection. Our method can also reveal subtle differences between human and model languages, which find theoretical roots in psycholinguistics studies.
2023
Spontaneous gestures encoded by hand positions improve language models: An Information-Theoretic motivated study
Yang Xu | Yang Cheng
Findings of the Association for Computational Linguistics: ACL 2023
Yang Xu | Yang Cheng
Findings of the Association for Computational Linguistics: ACL 2023
The multi-modality nature of human communication has been utilized to enhance the performance of language modeling-related tasks. Driven by the development of large-scale end-to-end learning techniques and the availability of multi-modal data, it becomes possible to represent non-verbal communication behaviors through joint-learning, and directly study their interaction with verbal communication. However, there is still gaps in existing studies to better address the underlying mechanism of how non-verbal expression contributes to the overall communication purpose. Therefore, we explore two questions using mixed-modal language models trained against monologue video data: first, whether incorporating gesture representations can improve the language model’s performance (perplexity); second, whether spontaneous gestures demonstrate entropy rate constancy (ERC), which is an empirical pattern found in most verbal language data that supports the rational communication assumption from Information Theory. We have positive and interesting findings for both questions: speakers indeed use spontaneous gestures to convey “meaningful” information that enhances verbal communication, which can be captured with a simple spatial encoding scheme. More importantly, gestures are produced and organized rationally in a similar way as words, which optimizes the communication efficiency.