Kexin Ma
2026
Conflict-Aware Memory for Embodied Agents: Enhancing Vector Data Quality via Detection Rules
Kexin Ma | Haotian Wang | Shenglin Chen | Yishuai Cai | Huangyuyu | Ruochun Jin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Kexin Ma | Haotian Wang | Shenglin Chen | Yishuai Cai | Huangyuyu | Ruochun Jin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Embodied agents have successfully leveraged large language models (LLMs) to better transform human instructions and images into executable task plans. Furthermore, memories of agents can be leveraged to achieve continual self-learning and optimization. However, vector data quality problems emerge in memories when they are projected into vector space, especially in discerning contextually similar but semantically conflicting sentences and highly similar images. This is particularly detrimental to embodied AI as it potentially distorts the robot’s actions. To address this challenge, we propose Conflict Detection Rules (CDRs) to identify and manage data quality issues in vector knowledge bases, which assist in correcting the index structure and further improving the answer quality. Experimental results show that planners with CDRs exceed the basic LLM planner by 15.25% and 14.25% in grammatical accuracy (GA) and interpretation accuracy (IA) on average, respectively. Moreover, the entire workflow has been successfully integrated into various scenarios, demonstrating its practical applicability and robustness in the real world.
2024
Context-Driven Index Trimming: A Data Quality Perspective to Enhancing Precision of RALMs
Kexin Ma | Ruochun Jin | Wang Haotian | Wang Xi | Huan Chen | Yuhua Tang | Qian Wang
Findings of the Association for Computational Linguistics: EMNLP 2024
Kexin Ma | Ruochun Jin | Wang Haotian | Wang Xi | Huan Chen | Yuhua Tang | Qian Wang
Findings of the Association for Computational Linguistics: EMNLP 2024
Retrieval-Augmented Large Language Models(RALMs) have made significant strides in enhancing the accuracy of generated responses. However, existing research often overlooks the data quality issues within retrieval results, often caused by inaccurate existing vector-distance-based retrieval methods. We propose to boost the precision of RALMs’ answers from a data quality perspective through the Context-Driven Index Trimming (CDIT) framework, where Context Matching Dependencies (CMDs) are employed as logical data quality rules to capture and regulate the consistency between retrieved contexts. Based on the semantic comprehension capabilities of Large Language Models (LLMs), CDIT can effectively identify and discard retrieval results that are inconsistent with the query context and further modify indexes in the database, thereby improving answer quality. Experiments demonstrate average improvement of 3.75% in accuracy on challenging open-domain question-answering tasks. Also, the flexibility of CDIT is verified through its compatibility with various language models and indexing methods, which offers a promising approach to bolster RALMs’ data quality and retrieval precision jointly.