Multimodal knowledge graph completion (MKGC) aims to predict missing triples in MKGs using multimodal information. Recent research typically either extracts information from each modality separately to predict, then ensembles the predictions at the decision stage, or projects multiple modalities into a unified feature space to learn multimodal representations for prediction. However, these methods usually overlook the intrinsic correlation between modalities in MKGs which should be leveraged in both unimodal knowledge extraction and multimodal knowledge fusion. Motivated by this, we propose a noval Modal-collaborative knowledge learning (Moodle) framework for MKGC, the key idea of which is to foster mutual guidance and collaboration during unimodal knowledge extraction, to let each modality acquire distinct and complementary knowledge that subsequently enhances the multimodal knowledge fusion. Specifically, Moodle preserves the representations of different modalities to learn unimodal knowledge while modeling the mutual guidance through multi-task learning. Furthermore, Moodle performs multimodal knowledge fusion and prediction guided by unimodal knowledge, capturing their synergistic relationships and acquire fine-grained semantic knowledge through contrastive learning. Extensive experiments on three real-world datasets demonstrate the advantages of Moodle over state-of-the-art methods.
Tool-calling has changed Large Language Model (LLM) applications by integrating external tools, significantly enhancing their functionality across diverse tasks. However, this integration also introduces new security vulnerabilities, particularly in the tool scheduling mechanisms of LLM, which have not been extensively studied. To fill this gap, we present ToolCommander, a novel framework designed to exploit vulnerabilities in LLM tool-calling systems through adversarial tool injection. Our framework employs a well-designed two-stage attack strategy. Firstly, it injects malicious tools to collect user queries, then dynamically updates the injected tools based on the stolen information to enhance subsequent attacks. These stages enable ToolCommander to execute privacy theft, launch denial-of-service attacks, and even manipulate business competition by triggering unscheduled tool-calling. Notably, the ASR reaches 91.67% for privacy theft and hits 100% for denial-of-service and unscheduled tool calling in certain cases. Our work demonstrates that these vulnerabilities can lead to severe consequences beyond simple misuse of tool-calling systems, underscoring the urgent need for robust defensive strategies to secure LLM Tool-calling systems.
Multimodal emotion recognition (MER) aims to identify emotions by utilizing affective information from multiple modalities. Due to the inherent disparities among these heterogeneous modalities, there is a large modality gap in their representations, leading to the challenge of fusing multiple modalities for MER. To address this issue, this work proposes a novel attention-based MER framework by integrating representation subspace mapping with unimodal auxiliary loss for enhancing multimodal fusion capabilities. Initially, a representation subspace mapping module is proposed to map each modality into two distinct subspaces. One is modality-public, enabling the acquisition of common representations and reducing the discrepancies across modalities. The other is modality-unique, retaining the unique characteristics of each modality while eliminating redundant inter-modal attributes. Then, a cross-modality attention is leveraged to bridge the modality gap in unique representations and facilitate modality adaptation. Additionally, our method designs an unimodal auxiliary loss to remove the noise unrelated to emotion classification, resulting in robust and meaningful representations for MER. Comprehensive experiments are conducted on the IEMOCAP and MSP-Improv datasets, and experiment results show that our method achieves superior performance to state-of-the-art MER methods. Keywords: Multimodal emotion recognition, representation subspace mapping, cross-modality attention, unimodal auxiliary loss, fusion