Xin Yao
Other people with similar names: Xin Yao, Xin Yao
Unverified author pages with similar names: Xin Yao
2026
AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora
Jiaxin Bai | Wei Fan | Qi Hu | Qing Zong | Chunyang Li | Hong Ting Tsang | Hongyu Luo | Yauwai Yim | Haoyu Huang | Xiao Zhou | Feng Qin | Tianshi Zheng | Xi Peng | Xin Yao | Huiwen Yang | Leijie Wu | JI Yi | Gong Zhang | Renhai Chen | Yangqiu Song
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jiaxin Bai | Wei Fan | Qi Hu | Qing Zong | Chunyang Li | Hong Ting Tsang | Hongyu Luo | Yauwai Yim | Haoyu Huang | Xiao Zhou | Feng Qin | Tianshi Zheng | Xi Peng | Xin Yao | Huiwen Yang | Leijie Wu | JI Yi | Gong Zhang | Renhai Chen | Yangqiu Song
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We present AutoSchemaKG, a framework for fully autonomous knowledge graph construction that eliminates the need for predefined schemas. Our system leverages large language models to simultaneously extract knowledge triples and induce comprehensive schemas directly from text, modeling both entities and events while employing conceptualization to organize instances into semantic categories. Processing over 50 million documents, we construct ATLAS (Automated Triple Linking And Schema induction), a family of knowledge graphs with 900+ million nodes and 5.9 billion edges. This approach outperforms state-of-the-art baselines on multi-hop QA tasks and enhances LLM factuality. Notably, our schema induction achieves 92% semantic alignment with human-crafted schemas with zero manual intervention, demonstrating that billion-scale knowledge graphs with dynamically induced schemas can effectively complement parametric knowledge in large language models.
RecMem: Recurrence-based Memory Consolidation for Efficient and Effective Long-Running LLM Agents
Zijie Dai | Shiyuan Deng | Sheng Guan | Yizhou Tian | Xin Yao | Xiao Yan | James Cheng
Findings of the Association for Computational Linguistics: ACL 2026
Zijie Dai | Shiyuan Deng | Sheng Guan | Yizhou Tian | Xin Yao | Xiao Yan | James Cheng
Findings of the Association for Computational Linguistics: ACL 2026
Memory systems often organize user-agent interactions as retrievable external memory and are crucial for long-running agents by overcoming the limited context windows of LLMs. However, existing memory systems invoke LLMs to process every incoming interaction for memory extraction, and such an eager memory consolidation scheme leads to substantial token consumption. To tackle this problem, we propose RecMem by rethinking when memory consolidation should be conducted. RecMem stores incoming interactions in a subconscious memory layer and encode them using lightweight embedding models for retrieval. LLMs are only invoked to extract episodic and semantic memory when sustained recurrence are observed for semantically similar interactions. Such recurrence-based consolidation works because these interactions correspond to a semantic cluster with rich information and thus are worth extraction and summarization. To improve accuracy, RecMem also incorporates a semantic refinement mechanism that recovers the fine-grained facts omitted by memory extraction. Experiments show that RecMem reduces the memory construction token cost of three SOTA memory systems by up to 87% while exceeding their accuracy.