Jingchao Ni
2026
TSAQA: Time Series Analysis Question And Answering Benchmark
Baoyu Jing | Sanhorn Chen | Lecheng Zheng | Boyu Liu | Zihao Li | Jiaru Zou | Tianxin Wei | Zhining Liu | Zhichen Zeng | Ruizhong Qiu | Xiao Lin | Yuchen Yan | Dongqi Fu | Jingchao Ni | Jingrui He | Hanghang Tong
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Baoyu Jing | Sanhorn Chen | Lecheng Zheng | Boyu Liu | Zihao Li | Jiaru Zou | Tianxin Wei | Zhining Liu | Zhichen Zeng | Ruizhong Qiu | Xiao Lin | Yuchen Yan | Dongqi Fu | Jingchao Ni | Jingrui He | Hanghang Tong
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Time series data are integral to applications across domains such as finance, healthcare, transportation, and environmental science.While recent work has begun to explore time series question answering (QA), existing benchmarks still provide limited coverage of analytical capabilities under a standardized evaluation framework. We introduce TSAQA, a novel unified benchmark designed to broaden task coverage and evaluate diverse temporal analysis capabilities. TSAQA integrates 6 diverse tasks under a single framework ranging fromconventional analysis, including anomaly detection and classification, to advanced analysis, such as characterization, comparison, datatransformation, and temporal relationship analysis. Spanning 210k samples across 13 domains, the dataset employs diverse formats, including true-or-false (TF), multiple-choice (MC), and a novel puzzling (PZ), to comprehensively assess time series analysis. Zero-shotevaluation shows that TSAQA remains challenging for current Large Language Models (LLMs): best-performing commercial model,Gemini-2.5-Flash, achieves 65.08 average accuracy. Although instruction tuning improves open-source models’ performance: the best-performing model, LLaMA-3.1-8B, shows significant room for improvement. We further evaluate language-capable time series foundation models (TSFMs), showing that TSAQA extends beyond general-purpose LLMs. The data are available in https://huggingface.co/datasets/TSAQA/TSAQA-Benchmark.
2025
Exploring Multi-Modal Data with Tool-Augmented LLM Agents for Precise Causal Discovery
ChengAo Shen | Zhengzhang Chen | Dongsheng Luo | Dongkuan Xu | Haifeng Chen | Jingchao Ni
Findings of the Association for Computational Linguistics: ACL 2025
ChengAo Shen | Zhengzhang Chen | Dongsheng Luo | Dongkuan Xu | Haifeng Chen | Jingchao Ni
Findings of the Association for Computational Linguistics: ACL 2025
Causal discovery is an imperative foundation for decision-making across domains, such as smart health, AI for drug discovery and AIOps. Traditional statistical causal discovery methods, while well-established, predominantly rely on observational data and often overlook the semantic cues inherent in cause-and-effect relationships. The advent of Large Language Models (LLMs) has ushered in an affordable way of leveraging the semantic cues for knowledge-driven causal discovery, but the development of LLMs for causal discovery lags behind other areas, particularly in the exploration of multi-modal data. To bridge the gap, we introduce MatMCD, a multi-agent system powered by tool-augmented LLMs. MatMCD has two key agents: a Data Augmentation agent that retrieves and processes modality-augmented data, and a Causal Constraint agent that integrates multi-modal data for knowledge-driven reasoning. The proposed design of the inner-workings ensures successful cooperation of the agents. Our empirical study across seven datasets suggests the significant potential of multi-modality enhanced causal discovery.
2021
Unsupervised Concept Representation Learning for Length-Varying Text Similarity
Xuchao Zhang | Bo Zong | Wei Cheng | Jingchao Ni | Yanchi Liu | Haifeng Chen
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Xuchao Zhang | Bo Zong | Wei Cheng | Jingchao Ni | Yanchi Liu | Haifeng Chen
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Measuring document similarity plays an important role in natural language processing tasks. Most existing document similarity approaches suffer from the information gap caused by context and vocabulary mismatches when comparing varying-length texts. In this paper, we propose an unsupervised concept representation learning approach to address the above issues. Specifically, we propose a novel Concept Generation Network (CGNet) to learn concept representations from the perspective of the entire text corpus. Moreover, a concept-based document matching method is proposed to leverage advances in the recognition of local phrase features and corpus-level concept features. Extensive experiments on real-world data sets demonstrate that new method can achieve a considerable improvement in comparing length-varying texts. In particular, our model achieved 6.5% better F1 Score compared to the best of the baseline models for a concept-project benchmark dataset.