Jiang Zhong

2025

LLMs have improved the fluency and informativeness of abstractive summarization but remain prone to hallucinations, where generated content deviates from the source document. Recent PMI decoding strategies mitigate over-reliance on prior knowledge by comparing output probabilities with and without source documents, effectively enhancing contextual utilization and improving faithfulness. However, existing strategies often neglect the explicit use of salient contextual information and rely on static hyperparameters to fix the balance between contextual and prior knowledge, limiting their flexibility. In this work, we propose Salience-Aware Reinforced Adaptive decoding (SARA), which incorporates salient information and allows the model to adaptively determine reliance on the source document’s context, salient context, and the model’s prior knowledge based on pointwise mutual information. Moreover, a tokenwise adaptive decoding mechanism via reinforcement learning is proposed in SARA to dynamically adjust the contributions of context and prior knowledge at each decoding timestep. Experiments on CNN/DM, WikiHow, and NYT50 datasets show that SARA consistently improves the quality and faithfulness of summaries across various LLM backbones without modifying their weights.

Large Language Models (LLMs) exhibit remarkable generative capabilities, enabling the generation of valuable information. Despite these advancements, previous research found that LLMs sometimes struggle with adhering to specific constraints, such as being in a specific place or at a specific time, and at times even overlook them, which leads to responses that are either too generic or not fully satisfactory. Existing approaches attempted to address this issue by decomposing and rewriting input instructions or reflecting on prior failings, yet they fall short in adequately emphasizing specific constraints and unlocking the underlying knowledge, such as programming within the context of software development. In response, this paper proposes a simple yet effective method called Chain-of-Specificity (CoS). Specifically, CoS emphasizes the specific constraints in the input instructions, unlocks knowledge within LLMs, and refines responses. Experiments conducted on publicly available and self-built complex datasets demonstrate that CoS outperforms existing methods in enhancing generated content, especially in terms of specificity. Additionally, as the number of specific constraints increases, other baselines falter, while CoS still performs well. Moreover, we show that distilling responses generated by CoS effectively enhances the ability of smaller models to follow constrained instructions.

Key Information Extraction (KIE) is a challenging multimodal task aimed at extracting structured value entities from visually rich documents. Despite recent advancements, two major challenges remain. First, existing datasets typically feature fixed layouts and a limited set of entity categories, while current methods are based on a full-shot setting that is difficult to apply in real-world scenarios, where new entity categories frequently emerge. Secondly, current methods often treat key entities simply as parts of the OCR-parsed context, neglecting the positive impact of the relationships between key-value entities. To address the first challenge, we introduce a new large-scale, human-annotated dataset, Complex Layout document for Key Information Extraction (CLEX). Comprising 5,860 images with 1,162 entity categories, CLEX is larger and more complex than existing datasets. It also primarily focuses on the zero-shot and few-shot KIE tasks, which are more aligned with real-world applications. To tackle the second challenge, we propose the Parallel Pointer-based Network (P²Net). This model frames KIE as a pointer-based classification task and effectively leverages implicit relationships between key-value entities to enhance extraction. Its parallel extraction mechanism enables simultaneous and efficient extraction of multiple results. Experiments on widely-used datasets, including SROIE, CORD, and the newly introduced CLEX, demonstrate that P²Net outperforms existing state-of-the-art methods (including GPT-4V) while maintaining fast inference speeds.

pdf bib abs
FedLEKE: Federated Locate-then-Edit Knowledge Editing for Multi-Client Collaboration
Zongkai Zhao | Guozeng Xu | Xiuhua Li | Kaiwen Wei | Jiang Zhong
Findings of the Association for Computational Linguistics: ACL 2025

Locate-then-Edit Knowledge Editing (LEKE) is a key technique for updating large language models (LLMs) without full retraining. However, existing methods assume a single-user setting and become inefficient in real-world multi-client scenarios, where decentralized organizations (e.g., hospitals, financial institutions) independently update overlapping knowledge, leading to redundant mediator knowledge vector (MKV) computations and privacy concerns.To address these challenges, we introduce Federated Locate-then-Edit Knowledge Editing (FedLEKE), a novel task that enables multiple clients to collaboratively perform LEKE while preserving privacy and reducing computational overhead. To achieve this, we propose FedEdit, a two-stage framework that optimizes MKV selection and reuse.In the first stage, clients locally apply LEKE and upload the computed MKVs. In the second stage, rather than relying solely on server-based MKV sharing, FedLEKE allows clients retrieve relevant MKVs based on cosine similarity, enabling knowledge re-edit and minimizing redundant computations.Experimental results on two benchmark datasets demonstrate that FedEdit retains over 96% of the performance of non-federated LEKE while significantly outperforming a FedAvg-based baseline by approximately twofold. Besides, we find that MEMIT performs more consistently than PMET in the FedLEKE task with our FedEdit framework. Our code is available at https://github.com/zongkaiz/FedLEKE.

Multimodal multi-label emotion recognition (MMER) aims to identify the concurrent presence of multiple emotions in multimodal data. Existing studies primarily focus on improving fusion strategies and modeling modality-to-label dependencies. However, they often overlook the impact of aleatoric uncertainty, which is the inherent noise in the multimodal data and hinders the effectiveness of modality fusion by introducing ambiguity into feature representations.To address this issue and effectively model aleatoric uncertainty, this paper proposes Latent emotional Distribution Decomposition with Uncertainty perception (LDDU) framework from a novel perspective of latent emotional space probabilistic modeling. Specifically, we introduce a contrastive disentangled distribution mechanism within the emotion space to model the multimodal data, allowing for the extraction of semantic features and uncertainty. Furthermore, we design an uncertainty-aware fusion multimodal method that accounts for the dispersed distribution of uncertainty and integrates distribution information. Experimental results show that LDDU achieves state-of-the-art performance on the CMU-MOSEI and M³ED datasets, highlighting the importance of uncertainty modeling in MMER. Code is available at https://github.com/201983290498/lddu_mmer.git.