Mark Zhao
2026
Bayesian Active Learning with Gaussian Processes Guided by LLM Relevance Scoring for Dense Passage Retrieval
Junyoung Kim | Anton Korikov | Jiazhou Liang | Justin Cui | Yifan Simon Liu | Qianfeng Wen | Mark Zhao | Scott Sanner
Findings of the Association for Computational Linguistics: ACL 2026
Junyoung Kim | Anton Korikov | Jiazhou Liang | Justin Cui | Yifan Simon Liu | Qianfeng Wen | Mark Zhao | Scott Sanner
Findings of the Association for Computational Linguistics: ACL 2026
While Large Language Models (LLMs) exhibit exceptional zero-shot relevance modeling, their high computational cost necessitates framing passage retrieval as a budget-constrained global optimization problem. Existing approaches passively rely on first-stage dense retrievers, which leads to two limitations: (1) failing to retrieve relevant passages in semantically distinct clusters, and (2) failing to propagate relevance signals to the broader corpus. To address these limitations, we propose Bayesian Active Learning with Gaussian Processes guided by LLM relevance scoring (BAGEL), a novel framework that propagates sparse LLM relevance signals across the embedding space to guide global exploration. BAGEL models the multimodal relevance distribution across the entire embedding space with a query-specific Gaussian Process (GP) based on LLM relevance scores. Subsequently, it iteratively selects passages for scoring by strategically balancing the exploitation of high-confidence regions with the exploration of uncertain areas. Extensive experiments across four benchmark datasets and two LLM backbones demonstrate that BAGEL effectively explores and captures complex relevance distributions and outperforms LLM reranking methods under the same LLM budget on all four datasets.
Multimodal Item Scoring for Natural Language Recommendation via Gaussian Process Regression with LLM Relevance Judgments
Yifan Simon Liu | Qianfeng Wen | Jiazhou Liang | Mark Zhao | Justin Cui | Anton Korikov | Armin Toroghi | Junyoung Kim | Scott Sanner
Findings of the Association for Computational Linguistics: ACL 2026
Yifan Simon Liu | Qianfeng Wen | Jiazhou Liang | Mark Zhao | Justin Cui | Anton Korikov | Armin Toroghi | Junyoung Kim | Scott Sanner
Findings of the Association for Computational Linguistics: ACL 2026
Natural Language Recommendation (NLRec) generates item suggestions based on the relevance between user-issued NL requests and NL item description passages. Existing NLRec approaches often use Dense Retrieval (DR) to compute item relevance scores from aggregation of inner products between user request embeddings and relevant passage embeddings. However, DR views the request as the sole relevance label, thus leading to a unimodal scoring function centered on the query embedding that is often a weak proxy for query relevance. To better capture the potential multimodal distribution of the relevance scoring function that may arise from complex NLRec data, we propose **GPR-LLM** that uses Gaussian Process Regression (GPR) with LLM relevance judgments for a subset of candidate passages. Experiments on four NLRec datasets and two LLM backbones demonstrate that GPR-LLM with an RBF kernel, capable of modeling multimodal relevance scoring functions, consistently outperforms simpler unimodal kernels (dot product, cosine similarity), as well as baseline methods including DR, cross-encoder, and pointwise LLM-based relevance scoring by up to 65%. Overall, GPR-LLM provides an efficient and effective approach to NLRec within a minimal LLM labeling budget.
2025
MA-DPR: Manifold-aware Distance Metrics for Dense Passage Retrieval
Yifan Liu | Qianfeng Wen | Mark Zhao | Jiazhou Liang | Scott Sanner
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yifan Liu | Qianfeng Wen | Mark Zhao | Jiazhou Liang | Scott Sanner
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Dense Passage Retrieval (DPR) typically relies on Euclidean or cosine distance to measure query–passage relevance in embedding space, which is effective when embeddings lie on a linear manifold. However, our experiments across DPR benchmarks suggest that embeddings often lie on lower-dimensional, non-linear manifolds, especially in out-of-distribution (OOD) settings, where cosine and Euclidean distance fail to capture semantic similarity. To address this limitation, we propose a *manifold-aware* distance metric for DPR (**MA-DPR**) that models the intrinsic manifold structure of passages using a nearest-neighbor graph and measures query–passage distance based on their shortest path in this graph. We show that MA-DPR outperforms Euclidean and cosine distances by up to **26%** on OOD passage retrieval, with comparable in-distribution performance across various embedding models, while incurring a minimal increase in query inference time. Empirical evidence suggests that manifold-aware distance allows DPR to leverage context from related neighboring passages, making it effective even in the absence of direct semantic overlap. MA-DPR can be applied to a wide range of dense embedding and retrieval tasks, offering potential benefits across a wide spectrum of domains.