Yifan Simon Liu
Also published as: Yifan Liu
Other people with similar names: Yifan Liu, Yifan Liu, Yifan Liu
Unverified author pages with similar names: Yifan Liu
2026
Bayesian Active Learning with Gaussian Processes Guided by LLM Relevance Scoring for Dense Passage Retrieval
Junyoung Kim | Anton Korikov | Jiazhou Liang | Justin Cui | Yifan Simon Liu | Qianfeng Wen | Mark Zhao | Scott Sanner
Findings of the Association for Computational Linguistics: ACL 2026
Junyoung Kim | Anton Korikov | Jiazhou Liang | Justin Cui | Yifan Simon Liu | Qianfeng Wen | Mark Zhao | Scott Sanner
Findings of the Association for Computational Linguistics: ACL 2026
While Large Language Models (LLMs) exhibit exceptional zero-shot relevance modeling, their high computational cost necessitates framing passage retrieval as a budget-constrained global optimization problem. Existing approaches passively rely on first-stage dense retrievers, which leads to two limitations: (1) failing to retrieve relevant passages in semantically distinct clusters, and (2) failing to propagate relevance signals to the broader corpus. To address these limitations, we propose Bayesian Active Learning with Gaussian Processes guided by LLM relevance scoring (BAGEL), a novel framework that propagates sparse LLM relevance signals across the embedding space to guide global exploration. BAGEL models the multimodal relevance distribution across the entire embedding space with a query-specific Gaussian Process (GP) based on LLM relevance scores. Subsequently, it iteratively selects passages for scoring by strategically balancing the exploitation of high-confidence regions with the exploration of uncertain areas. Extensive experiments across four benchmark datasets and two LLM backbones demonstrate that BAGEL effectively explores and captures complex relevance distributions and outperforms LLM reranking methods under the same LLM budget on all four datasets.
Evaluating Scene-based In-Situ Item Labeling for Immersive Conversational Recommendation
Jiazhou Liang | Yifan Simon Liu | David Guo | Yilun Jiang | Minqi Sun | Scott Sanner
Findings of the Association for Computational Linguistics: ACL 2026
Jiazhou Liang | Yifan Simon Liu | David Guo | Yilun Jiang | Minqi Sun | Scott Sanner
Findings of the Association for Computational Linguistics: ACL 2026
The growing ubiquity of Extended Reality (XR) is driving Conversational Recommendation Systems (CRS) toward visually immersive experiences. We formalize this paradigm as Immersive CRS (ICRS), where recommended items are highlighted directly in the user’s scene-based visual environment and augmented with in-situ labels. While item recommendation has been widely studied, the problem of how to select and evaluate which information to present as immersive labels remains an open problem. To this end, we introduce a principled categorization of information needs into explicit intent satisfaction and proactive information needs and use these to define novel evaluation metrics for item label selection. We benchmark IR-, LLM-, and VLM-based methods across three datasets and ICRS scenarios: fashion, movie recommendation, and retail shopping. Our evaluation reveals three important limitations of existing methods: (1) they fail to leverage scenario-specific information modalities (e.g., visual cues for fashion, metadata for retail), (2) they present redundant information that is visually inferable, and (3) they poorly anticipate users’ proactive information needs from explicit dialogue alone. In summary, this work provides both a novel evaluation paradigm for in-situ item labeling in ICRS and highlights key challenges for future work.
Multimodal Item Scoring for Natural Language Recommendation via Gaussian Process Regression with LLM Relevance Judgments
Yifan Simon Liu | Qianfeng Wen | Jiazhou Liang | Mark Zhao | Justin Cui | Anton Korikov | Armin Toroghi | Junyoung Kim | Scott Sanner
Findings of the Association for Computational Linguistics: ACL 2026
Yifan Simon Liu | Qianfeng Wen | Jiazhou Liang | Mark Zhao | Justin Cui | Anton Korikov | Armin Toroghi | Junyoung Kim | Scott Sanner
Findings of the Association for Computational Linguistics: ACL 2026
Natural Language Recommendation (NLRec) generates item suggestions based on the relevance between user-issued NL requests and NL item description passages. Existing NLRec approaches often use Dense Retrieval (DR) to compute item relevance scores from aggregation of inner products between user request embeddings and relevant passage embeddings. However, DR views the request as the sole relevance label, thus leading to a unimodal scoring function centered on the query embedding that is often a weak proxy for query relevance. To better capture the potential multimodal distribution of the relevance scoring function that may arise from complex NLRec data, we propose **GPR-LLM** that uses Gaussian Process Regression (GPR) with LLM relevance judgments for a subset of candidate passages. Experiments on four NLRec datasets and two LLM backbones demonstrate that GPR-LLM with an RBF kernel, capable of modeling multimodal relevance scoring functions, consistently outperforms simpler unimodal kernels (dot product, cosine similarity), as well as baseline methods including DR, cross-encoder, and pointwise LLM-based relevance scoring by up to 65%. Overall, GPR-LLM provides an efficient and effective approach to NLRec within a minimal LLM labeling budget.
2025
A Comparative Study of Static and Contextual Embeddings for Analyzing Semantic Changes in Medieval Latin Charters
Yifan Liu | Gelila Tilahun | Xinxiang Gao | Qianfeng Wen | Michael Gervers
Proceedings of the First Workshop on Language Models for Low-Resource Languages
Yifan Liu | Gelila Tilahun | Xinxiang Gao | Qianfeng Wen | Michael Gervers
Proceedings of the First Workshop on Language Models for Low-Resource Languages
The Norman Conquest of 1066 C.E. brought profound transformations to England’s administrative, societal, and linguistic practices. The DEEDS (Documents of Early England Data Set) database offers a unique opportunity to explore these changes by examining shifts in word meanings within a vast collection of Medieval Latin charters. While computational linguistics typically relies on vector representations of words like static and contextual embeddings to analyze semantic changes, existing embeddings for scarce and historical Medieval Latin are limited and may not be well-suited for this task. This paper presents the first computational analysis of semantic change pre- and post-Norman Conquest and the first systematic comparison of static and contextual embeddings in a scarce historical data set. Our findings confirm that, consistent with existing studies, contextual embeddings outperform static word embeddings in capturing semantic change within a scarce historical corpus.
MA-DPR: Manifold-aware Distance Metrics for Dense Passage Retrieval
Yifan Liu | Qianfeng Wen | Mark Zhao | Jiazhou Liang | Scott Sanner
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yifan Liu | Qianfeng Wen | Mark Zhao | Jiazhou Liang | Scott Sanner
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Dense Passage Retrieval (DPR) typically relies on Euclidean or cosine distance to measure query–passage relevance in embedding space, which is effective when embeddings lie on a linear manifold. However, our experiments across DPR benchmarks suggest that embeddings often lie on lower-dimensional, non-linear manifolds, especially in out-of-distribution (OOD) settings, where cosine and Euclidean distance fail to capture semantic similarity. To address this limitation, we propose a *manifold-aware* distance metric for DPR (**MA-DPR**) that models the intrinsic manifold structure of passages using a nearest-neighbor graph and measures query–passage distance based on their shortest path in this graph. We show that MA-DPR outperforms Euclidean and cosine distances by up to **26%** on OOD passage retrieval, with comparable in-distribution performance across various embedding models, while incurring a minimal increase in query inference time. Empirical evidence suggests that manifold-aware distance allows DPR to leverage context from related neighboring passages, making it effective even in the absence of direct semantic overlap. MA-DPR can be applied to a wide range of dense embedding and retrieval tasks, offering potential benefits across a wide spectrum of domains.