Jiahao Zhang

Papers on this page may belong to the following people: Jiahao Zhang, Jiahao Zhang


2026

Code Language Models (CodeLLMs) traditionally learn attention based solely on statistical input-output token correlations ("machine attention"). In contrast, human developers rely on intuition, selectively fixating on semantically salient tokens during program comprehension. We present EyeMulator, a model-agnostic technique to align CodeLLM attention with human visual attention without architectural changes. By extracting scan paths from eye-tracking data, we derive token-level attention weights used to augment the loss function during fine-tuning. This induces the model to mimic human focus. Our evaluation across StarCoder, Llama-3.2, and DeepSeek-Coder shows that EyeMulator significantly outperforms baselines, achieving gains of over 30 CodeBLEU points in translation and up to 22 BERTScore points in summarization. Ablation studies confirm that these gains stem directly from replicating human attention dynamics. Artifacts are available at https://zenodo.org/records/17205682.

2025

This paper investigates the group characteristics in multi-agent collaborative systems under adversarial attacks. Adversarial agents are tasked with generating counterfactual answers to a given collaborative problem, while collaborative agents normally interact with other agents to solve the given problem. To simulate real-world collaboration scenarios as closely as possible, we evaluate the collaborative system in three different collaboration scenarios and design three different communication strategies and different group structures. Furthermore, we explored several methods to mitigate adversarial attacks, all of which have been proven effective through our experiments. To quantify the robustness of collaborative systems against such attacks, a novel metric, System Defense Index (SDI), is introduced. Finally, we conducted an in-depth analysis from the perspective of group dynamics on how adversarial agents affect multi-agent collaborative systems, which reveals similarities between the agent collaboration process and human collaboration process. The code will be made available after publication.
Despite the demonstrated potential of Large Language Models (LLMs) in diverse NLP tasks, their causal reasoning capability appears inadequate when evaluated within the context of the event causality identification (ECI) task. The ECI tasks pose significant complexity for LLMs and necessitate comprehensive causal priors for accurate identification. To improve the performance of LLMs for causal reasoning, we propose a multi-agent Decomposed reasoning framework for Event Causality Identification, designated as Dr.ECI. In the discovery stage, Dr.ECI incorporates specialized agents such as Causal Explorer and Mediator Detector, which capture implicit causality and indirect causality more effectively. In the reasoning stage, Dr.ECI introduces the agents Direct Reasoner and Indirect Reasoner, which leverage the knowledge of the generalized causal structure specific to the ECI. Extensive evaluations demonstrate the state-of-the-art performance of Dr.ECI comparing with baselines based on LLMs and supervised training. Our implementation will be open-sourced at https://github.com/DMIRLAB-Group/Dr.ECI.

2024

Multi-hop question answering (QA) involves finding multiple relevant passages and step-by-step reasoning to answer complex questions, indicating a retrieve-and-read paradigm. However, previous retrievers were customized for two-hop questions, and most of them were trained separately across different hops, resulting in a lack of supervision over the entire multi-hop retrieval process and leading to poor performance in complicated scenarios beyond two hops. In this work, we introduce Beam Retrieval, an end-to-end beam retrieval framework for multi-hop QA. This approach models the multi-hop retrieval process in an end-to-end manner by jointly optimizing an encoder and two classification heads across all hops. Moreover, Beam Retrieval maintains multiple partial hypotheses of relevant passages at each step, expanding the search space and reducing the risk of missing relevant passages. To establish a complete QA system, we incorporate a supervised reader or a large language model (LLM). Experimental results demonstrate that Beam Retrieval achieves a nearly 50% improvement compared with baselines on challenging MuSiQue-Ans, and it also surpasses all previous retrievers on HotpotQA and achieves 99.9% precision on 2WikiMultiHopQA. Providing high-quality context, Beam Retrieval helps our supervised reader achieve new state-of-the-art performance and substantially improves the few-shot QA performance of LLMs.