Evangelos E. Papalexakis

2026

Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse domains. However, the reliability of responses from LLMs remains a question. Uncertainty quantification (UQ) of LLMs is crucial for ensuring their reliability, especially in areas such as healthcare. Existing UQ methods, often designed around a single resource such as Natural Language Inference (NLI) or graph-based metrics, fail to capture the multifaceted nature of uncertainty in natural language generation. In this work, we propose MS-UQ, a novel Multi-Resource Uncertainty Quantification framework that integrates heterogeneous uncertainty signals into a unified measure. Our approach concatenates matrices from diverse resources and employs tensor decomposition to orthogonally disentangle unique and shared information. To ensure scalability, we construct an adaptive ensemble of outputs from different decomposition methods, enabling the incorporation of new uncertainty sources. Experiments on CoQA, NQ_Open, and HotpotQA demonstrate that MS-UQ consistently outperforms existing methods, offering a comprehensive and scalable solution for uncertainty estimation in black-box LLMs and a more robust framework for enhancing LLM reliability in high-stakes applications. Our code can be accessed at https://anonymous.4open.science/r/MDUQ-First-202E/README.md.

pdf bib abs

Every Response Counts: Quantifying Uncertainty of LLM-based Multi-Agent Systems through Tensor Decomposition
Tiejin Chen | Huaiyuan Yao | Jia Chen | Evangelos E. Papalexakis | Hua Wei
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While Large Language Model-based Multi-Agent Systems (MAS) consistently outperform single-agent systems on complex tasks, their intricate interactions introduce critical reliability challenges arising from communication dynamics and role dependencies. Existing Uncertainty Quantification methods, typically designed for single-turn outputs, fail to address the unique complexities of the MAS. Specifically, these methods struggle with three distinct challenges: the cascading uncertainty in multi-step reasoning, the variability of inter-agent communication paths, and the diversity of communication topologies. To bridge this gap, we introduce MATU, a novel framework that quantifies uncertainty through tensor decomposition. MATU moves beyond analyzing final text outputs by representing entire reasoning trajectories as embedding matrices and organizing multiple execution runs into a higher-order tensor. By applying tensor decomposition, we disentangle and quantify distinct sources of uncertainty, offering a comprehensive reliability measure that is generalizable across different agent structures. We provide comprehensive experiments to show that MATU effectively estimates holistic and robust uncertainty across diverse tasks and communication topologies.

2025

pdf bib abs

ExpertGenQA: Open-ended QA generation in Specialized Domains
Haz Sameen Shahgir | Chansong Lim | Jia Chen | Evangelos E. Papalexakis | Yue Dong
Findings of the Association for Computational Linguistics: EMNLP 2025

Generating high-quality question–answer (QA) pairs for specialized technical domains is essential for advancing knowledge comprehension, yet remains challenging. Existing methods often yield generic or shallow questions that fail to reflect the depth and structure of expert-written examples. We propose ExpertGenQA, a generation protocol that combines few-shot prompting with dual categorization by topic and question style to produce more diverse and cognitively meaningful QA pairs. ExpertGenQA achieves twice the efficiency of standard few-shot methods while maintaining 94.4% topic coverage. Unlike LLM-based judges, which often favor surface fluency, Bloom’s Taxonomy analysis shows that ExpertGenQA better captures expert-level cognitive complexity. When used to train retrieval systems, our questions improve top-1 accuracy by 13.02%, demonstrating their practical value for domain-specific applications.

Co-authors

Venues

Findings2
ACL1

Fix author