Litu Ou
2026
BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents
Litu Ou | Kuan Li | Huifeng Yin | Liwen Zhang | Zhongwang Zhang | Xixi Wu | Rui Ye | Zile Qiao | Yong Jiang | Pengjun Xie | Fei Huang | Jingren Zhou
Findings of the Association for Computational Linguistics: ACL 2026
Litu Ou | Kuan Li | Huifeng Yin | Liwen Zhang | Zhongwang Zhang | Xixi Wu | Rui Ye | Zile Qiao | Yong Jiang | Pengjun Xie | Fei Huang | Jingren Zhou
Findings of the Association for Computational Linguistics: ACL 2026
Confidence in LLMs is a useful indicator of model uncertainty and answer reliability. Existing work mainly focused on single-turn scenarios, while research on confidence in complex multi-turn interactions is limited. In this paper, we investigate whether LLM-based search agents have the ability to communicate their own confidence through verbalized confidence scores after long sequences of actions, a significantly more challenging task compared to outputting confidence in a single interaction. Experimenting on open-source agentic models, we first find that models exhibit much higher task accuracy at high confidence while having near-zero accuracy when confidence is low. Based on this observation, we propose Test-Time Scaling (TTS) methods that use confidence scores to determine answer quality, encourage the model to try again until reaching a satisfactory confidence level. Results show that our proposed methods significantly reduce token consumption while demonstrating competitive performance compared to baseline fixed budget TTS methods.
2025
Context-Aware Hierarchical Merging for Long Document Summarization
Litu Ou | Mirella Lapata
Findings of the Association for Computational Linguistics: ACL 2025
Litu Ou | Mirella Lapata
Findings of the Association for Computational Linguistics: ACL 2025
Hierarchical Merging is a technique commonly used to summarize very long texts (>100K tokens) by breaking down the input into smaller sections, summarizing those sections individually, and then merging or combining those summaries into a final coherent summary. Although it helps address the limitations of large language models (LLMs) with fixed input length constraints, the recursive merging process can amplify LLM hallucinations, increasing the risk of factual inaccuracies. In this paper, we seek to mitigate hallucinations by enriching hierarchical merging with context from the source document. Specifically, we propose different approaches to contextual augmentation ranging from *replacing* intermediate summaries with relevant input context, to *refining* them while using the context as supporting evidence, and *aligning* them implicitly (via citations) to the input. Experimental results on datasets representing legal and narrative domains show that contextual augmentation consistently outperforms zero-shot and hierarchical merging baselines for the Llama 3.1 model family. Our analysis further reveals that refinement methods tend to perform best when paired with extractive summarization for identifying relevant input.