Caleb Chen Cao
2026
InsightEval: An Expert-Curated Benchmark for Assessing Insight Discovery in LLM-Driven Data Agents
Zhenghao Zhu | Yuanfeng Song | Xing Chen | Chengzhong Liu | Cui Yakun | Caleb Chen Cao | Sirui Han | Yike Guo
Findings of the Association for Computational Linguistics: ACL 2026
Zhenghao Zhu | Yuanfeng Song | Xing Chen | Chengzhong Liu | Cui Yakun | Caleb Chen Cao | Sirui Han | Yike Guo
Findings of the Association for Computational Linguistics: ACL 2026
Data analysis has become an indispensable part of scientific research. To discover the latent knowledge and insights hidden within massive datasets, we need to perform deep exploratory analysis to realize their full value. With the advent of large language models (LLMs) and multi-agent systems, more and more researchers are making use of these technologies for insight discovery. However, there are few benchmarks for evaluating insight discovery capabilities. As one of the most comprehensive existing frameworks, InsightBench also suffers from many critical flaws: format inconsistencies, poorly conceived objectives, and redundant insights. These issues may significantly affect the quality of data and the evaluation of agents. To address these issues, we thoroughly investigate shortcomings in InsightBench and propose essential criteria for a high-quality insight benchmark. Regarding this, we develop a data-curation pipeline to construct a new dataset named InsightEval. We further introduce a novel metric to measure the exploratory performance of agents. Through extensive experiments on InsightEval, we highlight prevailing challenges in automated insight discovery and raise some key findings to guide future research in this promising direction.
VizoMem: A Visual-Textual Memory Framework for Efficient Long-Horizon Reasoning
Weijie Liang | Yuanfeng Song | Xing Chen | Caleb Chen Cao | Sirui Han | Yike Guo
Findings of the Association for Computational Linguistics: ACL 2026
Weijie Liang | Yuanfeng Song | Xing Chen | Caleb Chen Cao | Sirui Han | Yike Guo
Findings of the Association for Computational Linguistics: ACL 2026
Agentic systems built upon large language models (LLMs) increasingly depend on long-context modeling to support document understanding, long-term memory recall, and multi-step reasoning. However, extending context windows incurs substantial computational and memory overhead, significantly limiting the scalability and practicality of long-context LLM-based agents. Recent studies suggest that visual representations can serve as an effective medium for compressing and organizing long textual content. Motivated by this insight, we propose VizoMem, a novel visual memory framework for agentic systems. In this framework, textual memories are pre-rendered into structured images and stored as visual notes, enabling compact and persistent memory representations. Moving beyond standard vision-language models like Glyph, we pioneer a specialized retrieval system designed for large-scale visual memory. Our innovation lies in the construction of a dedicated dataset and the development of a highly efficient retrieval model that repurposes foundational vision-language encoders to navigate complex, text-heavy visual environments. Experiments on public datasets demonstrate that our approach significantly reduces token consumption while preserving effective long-term memory recall, highlighting its potential as a scalable alternative to conventional long-context modeling.
2025
ArchiDocGen: Multi-Agent Framework for Expository Document Generation in the Architectural Industry
Junjie Jiang | Haodong Wu | Yongqi Zhang | Songyue Guo | Bingcen Liu | Caleb Chen Cao | Ruizhe Shao | Chao Guan | Peng Xu | Lei Chen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Junjie Jiang | Haodong Wu | Yongqi Zhang | Songyue Guo | Bingcen Liu | Caleb Chen Cao | Ruizhe Shao | Chao Guan | Peng Xu | Lei Chen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
The architectural industry produces extensive documents, including method statements—expository documents that integrate multi-source data into actionable guidance. Manual drafting however is labor-intensive and time-consuming. This paper introduces ArchiDocGen, a multi-agent framework automating method statement generation. Unlike traditional approaches relying on static templates or single-pass generation, ArchiDocGen decomposes the task into three steps: outline generation, section-based content generation, and polishing, each handled by specialized agents. To provide domain expertise, ArchiDocGen employs a section-based retriever to fetch and synthesize relevant documents from its custom knowledge base. Each section is generated through iterative reasoning of a section-based chain-of-thought (SeCoT) scheme, followed by refinement to meet professional standards. To evaluate the generated method statements, we partner with the industry to establish a multi-dimensional evaluation system by combining automatic and empirical methods. Experiments show that ArchiDocGen achieves 4.38 ContentScore, excelling in specialization, completeness, organization, and clarity. Additionally, a web-based application for ArchiDocGen is developed and deployed with industry partners.