Yixuan Liu

2026

AI Agents for the Science of Science: A Survey of Tasks, Architectures, Evaluations, and Challenges
Yixuan Liu | Yicheng Zhang
Findings of the Association for Computational Linguistics: ACL 2026

The Science of Science (SciSci) examines how scientific knowledge is generated, evaluated, and transformed by utilizing large-scale scholarly and bibliometric data. As these data grow in scale and complexity, analysis has increasingly relied on statistical, network-based, machine learning methods, and is now seeing growing involvement of AI agents. This emerging class of such agents, ranging from multi-agent simulations of scientific behavior to tool-augmented systems for empirical analysis, is beginning to reshape how SciSci research is conducted. In this survey, we propose a task-centered taxonomy, distinguishing *agents as simulations*, which model citation, collaboration, and community dynamics, from *agents as tools*, which assist empirical analysis and scientific workflows. We review agent architectures, learning mechanisms, evaluation, and SciSci benchmarks, and examine open challenges related to reliability, data quality, and bias. Our survey aims to clarify the landscape of AI agents in SciSci and to support the development of reliable and scientifically useful AI systems for studying science and scientific communities.

2025

pdf bib abs

Unequal Scientific Recognition in the Age of LLMs
Yixuan Liu | Abel Elekes | Jianglin Lu | Rodrigo Dorantes-Gilardi | Albert-Laszlo Barabasi
Findings of the Association for Computational Linguistics: EMNLP 2025

Large language models (LLMs) are reshaping how scientific knowledge is accessed and represented. This study evaluates the extent to which popular and frontier LLMs including GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro recognize scientists, benchmarking their outputs against OpenAlex and Wikipedia. Using a dataset focusing on 100,000 physicists from OpenAlex to evaluate LLM recognition, we uncover substantial disparities: LLMs exhibit selective and inconsistent recognition patterns. Recognition correlates strongly with scholarly impact such as citations, and remains uneven across gender and geography. Women researchers, and researchers from Africa, Asia, and Latin America are significantly underrecognized. We further examine the role of training data provenance, identifying Wikipedia as a potential sources that contributes to recognition gaps. Our findings highlight how LLMs can reflect, and potentially amplify existing disparities in science, underscoring the need for more transparent and inclusive knowledge systems.

2022

pdf bib abs

Beyond the Granularity: Multi-Perspective Dialogue Collaborative Selection for Dialogue State Tracking
Jinyu Guo | Kai Shuang | Jijie Li | Zihan Wang | Yixuan Liu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In dialogue state tracking, dialogue history is a crucial material, and its utilization varies between different models. However, no matter how the dialogue history is used, each existing model uses its own consistent dialogue history during the entire state tracking process, regardless of which slot is updated. Apparently, it requires different dialogue history to update different slots in different turns. Therefore, using consistent dialogue contents may lead to insufficient or redundant information for different slots, which affects the overall performance. To address this problem, we devise DiCoS-DST to dynamically select the relevant dialogue contents corresponding to each slot for state updating. Specifically, it first retrieves turn-level utterances of dialogue history and evaluates their relevance to the slot from a combination of three perspectives: (1) its explicit connection to the slot name; (2) its relevance to the current turn dialogue; (3) Implicit Mention Oriented Reasoning. Then these perspectives are combined to yield a decision, and only the selected dialogue contents are fed into State Generator, which explicitly minimizes the distracting information passed to the downstream state prediction. Experimental results show that our approach achieves new state-of-the-art performance on MultiWOZ 2.1 and MultiWOZ 2.2, and achieves superior performance on multiple mainstream benchmark datasets (including Sim-M, Sim-R, and DSTC2).

Yixuan Liu

2026

2025

2022

2008

Co-authors

Venues