Yu Wang
Other people with similar names: Yu Wang, Yu Wang, Yu Wang, Yu Wang, Yu Wang, Yu Wang, Yu Wang (王昱) (Hong Kong Polytechnic)
Unverified author pages with similar names: Yu Wang
2026
Eye Movement Features Can Predict Human Preferences on Machine-Generated Texts
Xiaoshan He | Xiaoqun Liu | Haodong He | Yu Wang | Yang Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Xiaoshan He | Xiaoqun Liu | Haodong He | Yu Wang | Yang Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Eye movement offers valuable insights into human visual attention during assessment of machine-generated texts, yet existing research and resources in this area are limited. To bridge this gap, we introduce Gaze Responses for Evaluating AI Texts (GREAT), a comprehensive dataset capturing human eye-movement features during screen reading of passages generated by large language models (LLMs). The dataset includes raw eye-movement recordings, reading-time measurements, and post-reading evaluations for LLM-generated passage pairs, alongside rigorous validation metrics. The collected eye-movement features demonstrate strong explanatory power in predicting text quality. When integrated with negative log-likelihood (NLL), a commonly used metric for evaluating text quality, it substantially enhances model performance across all standard statistical criteria. These findings demonstrate that eye-movement can act as an effective source of information that complements probabilistic metrics, for the task of automatic text quality assessment. The full dataset and some processing code are publicly available at https://github.com/qwurd231/GREAT.
Identifying the Periodicity of Information in Natural Language
Yulin OU | Yu Wang | Yang Xu | Hendrik Buschmeier
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yulin OU | Yu Wang | Yang Xu | Hendrik Buschmeier
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent theoretical advancement of information density in natural language has brought the following question on desk: To what degree does natural language exhibit periodicity pattern in its encoded information? We address this question by introducing a new method called AutoPeriod of Surprisal (APS). APS adopts a canonical periodicity detection algorithm and is able to identify any significant periods that exist in the surprisal sequence of a single document. By applying the algorithm to a set of corpora, we have obtained the following interesting results: Firstly, a considerable proportion of human language demonstrates a strong pattern of periodicity in information; Secondly, new periods that are outside the distributions of typical structural units in text (e.g., sentence boundaries, elementary discourse units, etc.) are found and further confirmed via harmonic regression modeling. We conclude that the periodicity of information in language is a joint outcome from both structured factors and other driving factors that take effect at longer distances. The advantages of our periodicity detection method and its potentials in LLM-generation detection are further discussed.
Investigating the Representation of Backchannels and Fillers in Fine-tuned Language Models
Yu Wang | Leyi Lao | Langchu Huang | Gabriel Skantze | Yang Xu | Hendrik Buschmeier
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yu Wang | Leyi Lao | Langchu Huang | Gabriel Skantze | Yang Xu | Hendrik Buschmeier
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Backchannels and fillers are important linguistic expressions in dialogue, but often treated as "noise" to be bypassed in modern transformer-based language models. Our work studies the representation of them in language models using three fine-tuning strategies. The models are trained on three dialogue corpora in English and Japanese, where backchannels and fillers are preserved and annotated, to investigate how fine-tuning can help LMs learn their representations. We first apply clustering analysis to the learnt representation of backchannels and fillers, and have found increased silhouette scores in representations from fine-tuned models, which suggests that fine-tuning enables LMs to distinguish the nuanced semantic variation in different backchannel and filler use. We also use natural language generation (NLG) metrics and qualitative analysis to confirm that the utterances generated by fine-tuned language models resemble human-produced utterances more closely. Our findings suggest the potentials of transforming general LMs into conversational LMs that are more capable of producing human-like languages adequately.