Yifan Zhu
Papers on this page may belong to the following people: Yifan Zhu, Yifan Zhu
2025
DecoupledESC: Enhancing Emotional Support Generation via Strategy-Response Decoupled Preference Optimization
Chao Zhang | Xin Shi | Xueqiao Zhang | Yifan Zhu | Yi Yang | Yawei Luo
Findings of the Association for Computational Linguistics: EMNLP 2025
Chao Zhang | Xin Shi | Xueqiao Zhang | Yifan Zhu | Yi Yang | Yawei Luo
Findings of the Association for Computational Linguistics: EMNLP 2025
Recent advances in Emotional Support Conversation (ESC) have improved emotional support generation by fine-tuning Large Language Models (LLMs) via Supervised Fine-Tuning (SFT). However, common psychological errors still persist. While Direct Preference Optimization (DPO) shows promise in reducing such errors through pairwise preference learning, its effectiveness in ESC tasks is limited by two key challenges: (1) Entangled data structure: Existing ESC data inherently entangles psychological strategies and response content, making it difficult to construct high-quality preference pairs; and (2) Optimization ambiguity: Applying vanilla DPO to such entangled pairwise data leads to ambiguous training objectives. To address these issues, we introduce Inferential Preference Mining (IPM) to construct high-quality preference data, forming the IPM-PrefDial dataset. Building upon this data, we propose a Decoupled ESC framework inspired by Gross’s Extended Process Model of Emotion Regulation, which decomposes the ESC task into two sequential subtasks: strategy planning and empathic response generation. Each was trained via SFT and subsequently enhanced by DPO to align with the psychological preference. Extensive experiments demonstrate that our Decoupled ESC framework outperforms baselines, reducing preference bias and improving response quality.
MASTER: Multi-Agent Security Through Exploration of Roles and Topological Structures - A Comprehensive Framework
Yifan Zhu | Chao Zhang | Xin Shi | Xueqiao Zhang | Yi Yang | Yawei Luo
Findings of the Association for Computational Linguistics: EMNLP 2025
Yifan Zhu | Chao Zhang | Xin Shi | Xueqiao Zhang | Yi Yang | Yawei Luo
Findings of the Association for Computational Linguistics: EMNLP 2025
Large Language Models (LLMs)-based Multi-Agent Systems (MAS) exhibit remarkable problem-solving and task planning capabilities across diverse domains due to their specialized agentic roles and collaborative interactions. However, this also amplifies the severity of security risks under MAS attacks. To address this, we introduce MASTER, a novel security research framework for MAS, focusing on diverse Role configurations and Topological structures across various scenarios. MASTER offers an automated construction process for different MAS setups and an information-flow-based interaction paradigm. To tackle MAS security challenges in varied scenarios, we design a scenario-adaptive, extensible attack strategy utilizing role and topological information, which dynamically allocates targeted, domain-specific attack tasks for collaborative agent execution. Our experiments demonstrate that such an attack, leveraging role and topological information, exhibits significant destructive potential across most models. Additionally, we propose corresponding defense strategies, substantially enhancing MAS resilience across diverse scenarios. We anticipate that our framework and findings will provide valuable insights for future research into MAS security challenges.
Multimodal Common Ground Annotation for Partial Information Collaborative Problem Solving
Yifan Zhu | Changsoo Jung | Kenneth Lai | Videep Venkatesha | Mariah Bradford | Jack Fitzgerald | Huma Jamil | Carine Graff | Sai Kiran Ganesh Kumar | Bruce Draper | Nathaniel Blanchard | James Pustejovsky | Nikhil Krishnaswamy
Proceedings of the 21st Joint ACL - ISO Workshop on Interoperable Semantic Annotation (ISA-21)
Yifan Zhu | Changsoo Jung | Kenneth Lai | Videep Venkatesha | Mariah Bradford | Jack Fitzgerald | Huma Jamil | Carine Graff | Sai Kiran Ganesh Kumar | Bruce Draper | Nathaniel Blanchard | James Pustejovsky | Nikhil Krishnaswamy
Proceedings of the 21st Joint ACL - ISO Workshop on Interoperable Semantic Annotation (ISA-21)
This project note describes challenges and procedures undertaken in annotating an audiovisual dataset capturing a multimodal situated collaborative construction task. In the task, all participants begin with different partial information, and must collaborate using speech, gesture, and action to arrive a solution that satisfies all individual pieces of private information. This rich data poses a number of annotation challenges, from small objects in a close space, to the implicit and multimodal fashion in which participants express agreement, disagreement, and beliefs. We discuss the data collection procedure, annotation schemas and tools, and future use cases.
TRACE: Real-Time Multimodal Common Ground Tracking in Situated Collaborative Dialogues
Hannah VanderHoeven | Brady Bhalla | Ibrahim Khebour | Austin C. Youngren | Videep Venkatesha | Mariah Bradford | Jack Fitzgerald | Carlos Mabrey | Jingxuan Tu | Yifan Zhu | Kenneth Lai | Changsoo Jung | James Pustejovsky | Nikhil Krishnaswamy
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
Hannah VanderHoeven | Brady Bhalla | Ibrahim Khebour | Austin C. Youngren | Videep Venkatesha | Mariah Bradford | Jack Fitzgerald | Carlos Mabrey | Jingxuan Tu | Yifan Zhu | Kenneth Lai | Changsoo Jung | James Pustejovsky | Nikhil Krishnaswamy
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
We present TRACE, a novel system for live *common ground* tracking in situated collaborative tasks. With a focus on fast, real-time performance, TRACE tracks the speech, actions, gestures, and visual attention of participants, uses these multimodal inputs to determine the set of task-relevant propositions that have been raised as the dialogue progresses, and tracks the group’s epistemic position and beliefs toward them as the task unfolds. Amid increased interest in AI systems that can mediate collaborations, TRACE represents an important step forward for agents that can engage with multiparty, multimodal discourse.
Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents
Xueqiao Zhang | Chao Zhang | Jingtao Xu | Yifan Zhu | Xin Shi | Yi Yang | Yawei Luo
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Xueqiao Zhang | Chao Zhang | Jingtao Xu | Yifan Zhu | Xin Shi | Yi Yang | Yawei Luo
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Role-playing agents (RPAs) have attracted growing interest for their ability to simulate immersive and interactive characters. However, existing approaches primarily focus on static role profiles, overlooking the dynamic perceptual abilities inherent to humans. To bridge this gap, we introduce the concept of dynamic role profiles by incorporating video modality into RPAs. To support this, we construct Role-playing-Video60k, a large-scale, high-quality dataset comprising 60k videos and 700k corresponding dialogues. Based on this dataset, we develop a comprehensive RPA framework that combines adaptive temporal sampling with both dynamic and static role profile representations. Specifically, the dynamic profile is created by adaptively sampling video frames and feeding them to the LLM in temporal order, while the static profile consists of (1) character dialogues from training videos during fine-tuning, and (2) a summary context from the input video during inference. This joint integration enables RPAs to generate greater responses. Furthermore, we propose a robust evaluation method covering eight metrics. Experimental results demonstrate the effectiveness of our framework, highlighting the importance of dynamic role profiles in developing RPAs.
2024
Common Ground Tracking in Multimodal Dialogue
Ibrahim Khalil Khebour | Kenneth Lai | Mariah Bradford | Yifan Zhu | Richard A. Brutti | Christopher Tam | Jingxuan Tu | Benjamin A. Ibarra | Nathaniel Blanchard | Nikhil Krishnaswamy | James Pustejovsky
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Ibrahim Khalil Khebour | Kenneth Lai | Mariah Bradford | Yifan Zhu | Richard A. Brutti | Christopher Tam | Jingxuan Tu | Benjamin A. Ibarra | Nathaniel Blanchard | Nikhil Krishnaswamy | James Pustejovsky
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Within Dialogue Modeling research in AI and NLP, considerable attention has been spent on “dialogue state tracking” (DST), which is the ability to update the representations of the speaker’s needs at each turn in the dialogue by taking into account the past dialogue moves and history. Less studied but just as important to dialogue modeling, however, is “common ground tracking” (CGT), which identifies the shared belief space held by all of the participants in a task-oriented dialogue: the task-relevant propositions all participants accept as true. In this paper we present a method for automatically identifying the current set of shared beliefs and ”questions under discussion” (QUDs) of a group with a shared goal. We annotate a dataset of multimodal interactions in a shared physical space with speech transcriptions, prosodic features, gestures, actions, and facets of collaboration, and operationalize these features for use in a deep neural model to predict moves toward construction of common ground. Model outputs cascade into a set of formal closure rules derived from situated evidence and belief axioms and update operations. We empirically assess the contribution of each feature type toward successful construction of common ground relative to ground truth, establishing a benchmark in this novel, challenging task.
2023
UMR annotation of Chinese Verb compounds and related constructions
Haibo Sun | Yifan Zhu | Jin Zhao | Nianwen Xue
Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023)
Haibo Sun | Yifan Zhu | Jin Zhao | Nianwen Xue
Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023)
This paper discusses the challenges of annotating the predicate-argument structure of Chinese verb compounds in Uniform Meaning Representation (UMR), a recent meaning representation framework that extends Abstract Meaning Representation (AMR) to cross-linguistic settings. The key issue is to decide whether to annotate the argument structure of a verb compound as a whole, or to annotate the argument structure of their component verbs as well as the relations between them. We examine different types of Chinese verb compounds, and propose how to annotate them based on the principle of compositionality, level of grammaticalization, and productivity of component verbs. We propose a solution to the practical problem of having to define the semantic roles for Chinese verb compounds that are quite open-ended by separating compositional verb compounds from verb compounds that are non-compositional or have grammaticalized verb components. For compositional verb compounds, instead of annotating the argument structure of the verb compound as a whole, we annotate the argument structure of the component verbs as well as the semantic relations between them as creating an exhaustive list of such verb compounds is infeasible. Verb compounds with grammaticalized verb components also tend to be productive and we represent grammaticalized verb compounds as either attributes of the primary verb or as relations.
Search
Fix author
Co-authors
- Mariah Bradford 3
- Nikhil Krishnaswamy 3
- Kenneth Lai 3
- Yawei Luo 3
- James Pustejovsky 3
- Xin Shi 3
- Yi Yang 3
- Chao Zhang 3
- Xueqiao Zhang 3
- Nathaniel Blanchard 2
- Jack Fitzgerald 2
- Changsoo Jung 2
- Jingxuan Tu 2
- Videep Venkatesha 2
- Brady Bhalla 1
- Richard A. Brutti 1
- Bruce Draper 1
- Carine Graff 1
- Benjamin A. Ibarra 1
- Huma Jamil 1
- Ibrahim Khalil Khebour 1
- Ibrahim Khebour 1
- Sai Kiran Ganesh Kumar 1
- Carlos Mabrey 1
- Haibo Sun 1
- Christopher Tam 1
- Hannah VanderHoeven 1
- Jingtao Xu 1
- Nianwen Xue 1
- Austin C. Youngren 1
- Jin Zhao 1