This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
As large language models (LLMs) continue to advance, there is a growing urgency to enhance the interpretability of their internal knowledge mechanisms. Consequently, many interpretation methods have emerged, aiming to unravel the knowledge mechanisms of LLMs from various perspectives. However, current interpretation methods differ in input data formats and interpreting outputs. The tools integrating these methods are only capable of supporting tasks with specific inputs, significantly constraining their practical applications. To address these challenges, we present an open-source **Know**ledge **M**echanisms **R**evealer&**I**nterpreter (**Know-MRI**) designed to analyze the knowledge mechanisms within LLMs systematically. Specifically, we have developed an extensible core module that can automatically match different input data with interpretation methods and consolidate the interpreting outputs. It enables users to freely choose appropriate interpretation methods based on the inputs, making it easier to comprehensively diagnose the model’s internal knowledge mechanisms from multiple perspectives. Our code is available at https://github.com/nlpkeg/Know-MRI. We also provide a demonstration video on https://youtu.be/NVWZABJ43Bs.
The paper presents KMatrix-2, an open-source toolkit that supports comprehensive heterogeneous knowledge collaborative enhancement for Large Language Models (LLMs). As the successor of KMatrix, our toolkit offers powerful modular components and typical enhancement patterns for convenient construction of mainstream knowledge-enhanced LLMs systems. Besides, it provides unified knowledge integration and joint knowledge retrieval methods to achieve more comprehensive heterogeneous knowledge collaborative enhancement. Compared with KMatrix which mainly focuses on descriptive knowledge, this work additionally considers procedural knowledge. Moreover, systematic inter-context and context-memory knowledge conflict resolution methods are offered for better knowledge integration. Some key research questions in heterogeneous knowledge-enhanced Large Language Models systems are analyzed, and our toolkit’s capability in building such systems is validated.
Knowledge-Enhanced Large Language Models (K-LLMs) system enhances Large Language Models (LLMs) abilities using external knowledge. Existing K-LLMs toolkits mainly focus on free-textual knowledge, lacking support for heterogeneous knowledge like tables and knowledge graphs, and fall short in comprehensive datasets, models, and user-friendly experience. To address this gap, we introduce KMatrix: a flexible heterogeneous knowledge enhancement toolkit for LLMs including verbalizing-retrieval and parsing-query methods. Our modularity and control-logic flow diagram design flexibly supports the entire lifecycle of various complex K-LLMs systems, including training, evaluation, and deployment. To assist K-LLMs system research, a series of related knowledge, datasets, and models are integrated into our toolkit, along with performance analyses of K-LLMs systems enhanced by different types of knowledge. Using our toolkit, developers can rapidly build, evaluate, and deploy their own K-LLMs systems.