Caleb Chen Cao


2026

Data analysis has become an indispensable part of scientific research. To discover the latent knowledge and insights hidden within massive datasets, we need to perform deep exploratory analysis to realize their full value. With the advent of large language models (LLMs) and multi-agent systems, more and more researchers are making use of these technologies for insight discovery. However, there are few benchmarks for evaluating insight discovery capabilities. As one of the most comprehensive existing frameworks, InsightBench also suffers from many critical flaws: format inconsistencies, poorly conceived objectives, and redundant insights. These issues may significantly affect the quality of data and the evaluation of agents. To address these issues, we thoroughly investigate shortcomings in InsightBench and propose essential criteria for a high-quality insight benchmark. Regarding this, we develop a data-curation pipeline to construct a new dataset named InsightEval. We further introduce a novel metric to measure the exploratory performance of agents. Through extensive experiments on InsightEval, we highlight prevailing challenges in automated insight discovery and raise some key findings to guide future research in this promising direction.
Agentic systems built upon large language models (LLMs) increasingly depend on long-context modeling to support document understanding, long-term memory recall, and multi-step reasoning. However, extending context windows incurs substantial computational and memory overhead, significantly limiting the scalability and practicality of long-context LLM-based agents. Recent studies suggest that visual representations can serve as an effective medium for compressing and organizing long textual content. Motivated by this insight, we propose VizoMem, a novel visual memory framework for agentic systems. In this framework, textual memories are pre-rendered into structured images and stored as visual notes, enabling compact and persistent memory representations. Moving beyond standard vision-language models like Glyph, we pioneer a specialized retrieval system designed for large-scale visual memory. Our innovation lies in the construction of a dedicated dataset and the development of a highly efficient retrieval model that repurposes foundational vision-language encoders to navigate complex, text-heavy visual environments. Experiments on public datasets demonstrate that our approach significantly reduces token consumption while preserving effective long-term memory recall, highlighting its potential as a scalable alternative to conventional long-context modeling.

2025

The architectural industry produces extensive documents, including method statements—expository documents that integrate multi-source data into actionable guidance. Manual drafting however is labor-intensive and time-consuming. This paper introduces ArchiDocGen, a multi-agent framework automating method statement generation. Unlike traditional approaches relying on static templates or single-pass generation, ArchiDocGen decomposes the task into three steps: outline generation, section-based content generation, and polishing, each handled by specialized agents. To provide domain expertise, ArchiDocGen employs a section-based retriever to fetch and synthesize relevant documents from its custom knowledge base. Each section is generated through iterative reasoning of a section-based chain-of-thought (SeCoT) scheme, followed by refinement to meet professional standards. To evaluate the generated method statements, we partner with the industry to establish a multi-dimensional evaluation system by combining automatic and empirical methods. Experiments show that ArchiDocGen achieves 4.38 ContentScore, excelling in specialization, completeness, organization, and clarity. Additionally, a web-based application for ArchiDocGen is developed and deployed with industry partners.