Integrating multimodal clinical records—such as Electronic Health Records (EHR) and free-text clinical reports—has shown great potential in predicting clinical outcomes. However, prior work has primarily focused on capturing temporal interactions within individual samples and fusing multimodal information, overlooking critical temporal patterns across different patients. These patterns, such as trends in vital signs like abnormal heart rate or blood pressure, can indicate deteriorating health or an impending critical event of any individual in a given population. Similarly, clinical notes often contain textual descriptions that reflect these changes. Identifying corresponding temporal patterns across different modalities is crucial for improving the accuracy of clinical outcome predictions, yet it remains a challenging task. To address this gap, we introduce a Cross-modal Temporal Pattern Discovery (CTPD) framework, designed to efficiently extract meaningful cross-modal temporal patterns from multimodal EHR data. Our approach introduces shared initial temporal pattern representations and refines them using slot attention to generate temporal semantic embeddings. To ensure rich cross-modal temporal semantics in the learned patterns, we introduce a Temporal Pattern Noise Contrastive Estimation (TP-NCE) loss for cross-modal alignment, along with two reconstruction losses to retain core information of each modality. Evaluations on two clinically critical tasks—48 hour in-hospital mortality and 24-hour phenotype classification—using the MIMIC-III database demonstrate the superiority of our method over existing approaches. The code is anonymously available at https://github.com/HKU-MedAI/CTPD.
Large Language Models (LLMs) have emerged as a pivotal research area, yet the attention module remains a critical bottleneck in LLM inference, even with techniques like KVCache to mitigate redundant computations. While various top-k attention mechanisms have been proposed to accelerate LLM inference by exploiting the inherent sparsity of attention, they often struggled to strike a balance between efficiency and accuracy. In this paper, we introduce HATA (Hash-Aware Top-k Attention), a novel approach that systematically integrates low-overhead learning-to-hash techniques into the Top-k attention process. Different from the existing top-k attention methods which are devoted to seeking an absolute estimation of qk score, typically with a great cost, HATA maps queries and keys into binary hash codes, and acquires the relative qk score order with a quite low cost, which is sufficient for realizing top-k attention. Extensive experiments demonstrate that HATA achieves up to 7.2× speedup compared to vanilla full attention while maintaining model accuracy. In addition, HATA outperforms the state-of-the-art top-k attention methods in both accuracy and efficiency across multiple mainstream LLM models and diverse tasks. HATA is open source at https://github.com/gpzlx1/HATA.
Chain-of-Thought prompting has improved reasoning capability of large language models (LLM). However, it still is challenging to guarantee the effectiveness and stability for questions requiring complicated reasoning. Recently, Plan-and-Solve prompting enhances the reasoning capability for complex questions by planning the solution steps firstly and then solving them step by step, but it suffers the difficulty to represent and execute the problem-solving logic of complex questions. To deal with these challenges, in this work, we propose a novel Plan-and-Solve prompting method based on Question Decomposition Meaning Representation (QDMR). Specifically, this method first allows the LLM to generate a QDMR graph to represent the problem-solving logic, which is a directed acyclic graph composed of sub-questions. Then, the LLM generates a specific solving process based on the QDMR graph. When solving each sub-question, it can locate the preceding sub-questions and their answers according to the QDMR graph, and then utilize this information for solution. Compared with existing Plan-and-Solve prompting techniques, our method can not only represent the problem-solving logic of complicated questions more accurately with the aid of QDMR graph, but also deliver the dependence information accurately for different solution steps according to the QDMR graph. In addition, with the supervised fine-tuning on the Allen Institute dataset, the decomposing capability of LLM for complicated questions can be considerably enhanced. Extensive experiments show that our method has achieve a great significance in arithmetic reasoning and commonsense reasoning task by comparing the classical Chain-of-Thought prompting and Plan-and-Solve prompting techniques, and the improvements achieved are even greater for problems with more reasoning steps.
Recently, sentiment analysis has seen remarkable advance with the help of pre-training approaches. However, sentiment knowledge, such as sentiment words and aspect-sentiment pairs, is ignored in the process of pre-training, despite the fact that they are widely used in traditional sentiment analysis approaches. In this paper, we introduce Sentiment Knowledge Enhanced Pre-training (SKEP) in order to learn a unified sentiment representation for multiple sentiment analysis tasks. With the help of automatically-mined knowledge, SKEP conducts sentiment masking and constructs three sentiment knowledge prediction objectives, so as to embed sentiment information at the word, polarity and aspect level into pre-trained sentiment representation. In particular, the prediction of aspect-sentiment pairs is converted into multi-label classification, aiming to capture the dependency between words in a pair. Experiments on three kinds of sentiment tasks show that SKEP significantly outperforms strong pre-training baseline, and achieves new state-of-the-art results on most of the test datasets. We release our code at
https://github.com/baidu/Senta.