Zichao Li

Other people with similar names: Zichao Li

Unverified author pages with similar names: Zichao Li


2025

This paper presents Retrieval-Augmented Forecasting (RAF), a novel framework for tabular time-series prediction that dynamically retrieves and integrates relevant historical table slices. RAF addresses three key limitations of existing methods: 1) schema rigidity through dynamic hashing of column metadata, 2) temporal myopia via cross-attention with learned decay, and 3) pipeline sub-optimality via end-to-end retriever-forecaster co-training. Experiments across macroeconomic (FRED-MD), financial (Yahoo Finance), and development (WorldBank) benchmarks demonstrate RAF’s superiority over six baselines, reducing sMAPE by 19.1-26.5% while maintaining robustness to schema changes (+3.2% sMAPE increase vs. +6.7-12.7% for alternatives). The architecture’s computational overhead (1.8 vs. 1.2 hours/epoch vs. TFT) is justified by significant accuracy gains in critical scenarios like market shocks (61.7% vs. 55.1% directional accuracy).
This paper presents a knowledge-grounded framework for cryptocurrency scam detection using retrieval-augmented language models. We address three key limitations of existing approaches: static knowledge bases, unreliable LM outputs, and fixed classification thresholds. Our method combines (1) temporally-weighted retrieval from scam databases, (2) confidence-aware fusion of parametric and external knowledge, and (3) adaptive threshold optimization via gradient ascent. Experiments on CryptoScams and Twitter Financial Scams datasets demonstrate state-of-the-art performance, with 22% higher recall at equivalent precision compared to fixed thresholds, 4.3× lower hallucination rates than pure LMs, and 89% temporal performance retention on emerging scam types. The system achieves real-time operation (45ms/query) while maintaining interpretability through evidence grounding. Ablation studies confirm each component’s necessity, with confidence fusion proving most critical (12.1% performance drop when removed). These advances enable more robust monitoring of evolving cryptocurrency threats while addressing fundamental challenges in knowledgeable foundation models.
Mathematical information retrieval requires understanding the complex relationship between natural language and formulae. This paper presents a benchmarking study on Formula-Text Cross-Retrieval, comparing a sparse baseline (BM25), off-the-shelf dense embeddings (OpenAI, BGE), and a fine-tuned dual-encoder model. Our model, trained with a contrastive objective on the ARQAR dataset, significantly outperforms all baselines, achieving state-of-the-art results. Ablation studies confirm the importance of linearization, a shared-weight architecture, and the Multiple Negatives Ranking loss. The work provides a strong foundation for mathematical NLP applications.