Rui Ding


2026

Text analysis of tabular data relies on two core operations: summarization for corpus-level theme extraction and tagging for row-level labeling. A critical limitation of employing large language models (LLMs) for these tasks is their inability to meet the high standards of output stability demanded by data analytics. To address this challenge, we introduce CAST (Consistency via Algorithmic Prompting and Stable Thinking), a framework that enhances output stability by constraining the model’s latent reasoning trajectory. CAST combines (i) Algorithmic Prompting to impose a procedural scaffold over valid reasoning transitions and (ii) Thinking-before-Speaking to enforce explicit intermediate commitments before final generation. To measure progress, we introduce CAST-S and CAST-T, stability metrics for bulleted summarization and tagging, and validate their alignment with human judgments. Experiments across publicly available benchmarks on multiple LLM backbones show that CAST consistently achieves the best stability among all baselines, improving Stability Score by up to 16.2%, while maintaining or improving output quality.

2023

Exploring data is crucial in data analysis, as it helps users understand and interpret the data more effectively. However, performing effective data exploration requires in-depth knowledge of the dataset, the user intent and expertise in data analysis techniques. Not being familiar with either can create obstacles that make the process time-consuming and overwhelming. To address this issue, we introduce InsightPilot, an LLM (Large Language Model)-based, automated data exploration system designed to simplify the data exploration process. InsightPilot features a set of carefully designed analysis actions that streamline the data exploration process. Given a natural language question, InsightPilot collaborates with the LLM to issue a sequence of analysis actions, explore the data and generate insights. We demonstrate the effectiveness of InsightPilot in a user study and a case study, showing how it can help users gain valuable insights from their datasets.