Zichao Li
Other people with similar names: Zichao Li
Unverified author pages with similar names: Zichao Li
2025
Retrieval-Augmented Forecasting with Tabular Time Series Data
Zichao Li
Proceedings of the 4th Table Representation Learning Workshop
Zichao Li
Proceedings of the 4th Table Representation Learning Workshop
This paper presents Retrieval-Augmented Forecasting (RAF), a novel framework for tabular time-series prediction that dynamically retrieves and integrates relevant historical table slices. RAF addresses three key limitations of existing methods: 1) schema rigidity through dynamic hashing of column metadata, 2) temporal myopia via cross-attention with learned decay, and 3) pipeline sub-optimality via end-to-end retriever-forecaster co-training. Experiments across macroeconomic (FRED-MD), financial (Yahoo Finance), and development (WorldBank) benchmarks demonstrate RAF’s superiority over six baselines, reducing sMAPE by 19.1-26.5% while maintaining robustness to schema changes (+3.2% sMAPE increase vs. +6.7-12.7% for alternatives). The architecture’s computational overhead (1.8 vs. 1.2 hours/epoch vs. TFT) is justified by significant accuracy gains in critical scenarios like market shocks (61.7% vs. 55.1% directional accuracy).
Knowledge-Grounded Detection of Cryptocurrency Scams with Retrieval-Augmented LMs
Zichao Li
Proceedings of the 3rd Workshop on Towards Knowledgeable Foundation Models (KnowFM)
Zichao Li
Proceedings of the 3rd Workshop on Towards Knowledgeable Foundation Models (KnowFM)
This paper presents a knowledge-grounded framework for cryptocurrency scam detection using retrieval-augmented language models. We address three key limitations of existing approaches: static knowledge bases, unreliable LM outputs, and fixed classification thresholds. Our method combines (1) temporally-weighted retrieval from scam databases, (2) confidence-aware fusion of parametric and external knowledge, and (3) adaptive threshold optimization via gradient ascent. Experiments on CryptoScams and Twitter Financial Scams datasets demonstrate state-of-the-art performance, with 22% higher recall at equivalent precision compared to fixed thresholds, 4.3× lower hallucination rates than pure LMs, and 89% temporal performance retention on emerging scam types. The system achieves real-time operation (45ms/query) while maintaining interpretability through evidence grounding. Ablation studies confirm each component’s necessity, with confidence fusion proving most critical (12.1% performance drop when removed). These advances enable more robust monitoring of evolving cryptocurrency threats while addressing fundamental challenges in knowledgeable foundation models.
Formula-Text Cross-Retrieval: A Benchmarking Study of Dense Embedding Methods for Mathematical Information Retrieval
Zichao Li
Proceedings of The 3rd Workshop on Mathematical Natural Language Processing (MathNLP 2025)
Zichao Li
Proceedings of The 3rd Workshop on Mathematical Natural Language Processing (MathNLP 2025)
Mathematical information retrieval requires understanding the complex relationship between natural language and formulae. This paper presents a benchmarking study on Formula-Text Cross-Retrieval, comparing a sparse baseline (BM25), off-the-shelf dense embeddings (OpenAI, BGE), and a fine-tuned dual-encoder model. Our model, trained with a contrastive objective on the ARQAR dataset, significantly outperforms all baselines, achieving state-of-the-art results. Ablation studies confirm the importance of linearization, a shared-weight architecture, and the Multiple Negatives Ranking loss. The work provides a strong foundation for mathematical NLP applications.