Luke Yoffe
2025
DebUnc: Improving Large Language Model Agent Communication With Uncertainty Metrics
Luke Yoffe
|
Alfonso Amayuelas
|
William Yang Wang
Findings of the Association for Computational Linguistics: EMNLP 2025
Multi-agent debates have been introduced to improve the accuracy of Large Language Models (LLMs) by having multiple agents discuss solutions to a problem over several rounds of debate. However, models often generate incorrect yet confident-sounding responses, which can mislead the others. This issue arises partly because agents do not consider how confident their peers are. To address this, we propose DebUnc, a debate framework that uses uncertainty metrics to assess agent confidence. Confidence is then conveyed through textual prompts or via a modified attention mechanism that adjusts token weights. Evaluations across benchmarks show that attention-based methods are particularly effective and that performance continues to improve as uncertainty estimation becomes more reliable. The code is available at https://github.com/lukeyoffe/debunc.
2022
FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue
Alon Albalak
|
Yi-Lin Tuan
|
Pegah Jandaghi
|
Connor Pryor
|
Luke Yoffe
|
Deepak Ramachandran
|
Lise Getoor
|
Jay Pujara
|
William Yang Wang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Task transfer, transferring knowledge contained in related tasks, holds the promise of reducing the quantity of labeled data required to fine-tune language models. Dialogue understanding encompasses many diverse tasks, yet task transfer has not been thoroughly studied in conversational AI. This work explores conversational task transfer by introducing FETA: a benchmark for FEw-sample TAsk transfer in open-domain dialogue.FETA contains two underlying sets of conversations upon which there are 10 and 7 tasks annotated, enabling the study of intra-dataset task transfer; task transfer without domain adaptation. We utilize three popular language models and three learning algorithms to analyze the transferability between 132 source-target task pairs and create a baseline for future work.We run experiments in the single- and multi-source settings and report valuable findings, e.g., most performance trends are model-specific, and span extraction and multiple-choice tasks benefit the most from task transfer.In addition to task transfer, FETA can be a valuable resource for future research into the efficiency and generalizability of pre-training datasets and model architectures, as well as for learning settings such as continual and multitask learning.
Search
Fix author
Co-authors
- William Yang Wang 2
- Alon Albalak 1
- Alfonso Amayuelas 1
- Lise Getoor 1
- Pegah Jandaghi 1
- show all...