This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
YunXue
Also published as:
云 薛
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
Multi-agent collaborative tasks exhibit exceptional capabilities in natural language applications and generation. By prompting agents to assign clear roles, it is possible to facilitate cooperation and achieve complementary capabilities among LLMs. A common strategy involves adopting a relatively general role assignment mechanism, such as introducing a “judge” or a “summarizer”. However, these approaches lack task-specific role customization based on task characteristics. Another strategy involves decomposing the task based on domain knowledge and task characteristics, followed by assigning appropriate roles according to LLMs’ respective strengths, such as programmers and testers. However, in some given tasks, obtaining domain knowledge related to task characteristics and getting the strengths of different LLMs is hard. To solve these problems, we propose a Multi-LLM Cooperation (MLC) framework with automatic role assignment capabilities. The core idea of the MLC is to initialize role assignments randomly and then allow the role embeddings to be learned jointly with the downstream task. To capture the state transitions of multiple LLMs during turn-based speaking, the role embedding is sequence-aware. At the same time, to avoid role convergence, the role differentiation module in MLC encourages behavioral differentiation between LLMs while ensuring the LLM team consistency, guiding different LLMs to develop complementary strengths from the optimization level. Our experiments on seven datasets demonstrate that MLC significantly enhances collaboration and expertise, which collaboratively addresses multi-agent tasks.
Multi-modal sarcasm detection aims to identify whether a given image-text pair is sarcastic. The pivotal factor of the task lies in accurately capturing incongruities from different modalities. Although existing studies have achieved impressive success, they primarily committed to fusing the textual and visual information to establish cross-modal correlations, overlooking the significance of original unimodal incongruity information at the text-level and image-level. Furthermore, the utilized fusion strategies of cross-modal information neglected the effect of inherent ambiguity within text and image modalities on multimodal fusion. To overcome these limitations, we propose a novel Ambiguity-aware Multi-level Incongruity Fusion Network (AMIF) for multi-modal sarcasm detection. Our method involves a multi-level incongruity learning module to capture the incongruity information simultaneously at the text-level, image-level and cross-modal-level. Additionally, an ambiguity-based fusion module is developed to dynamically learn reasonable weights and interpretably aggregate incongruity features from different levels. Comprehensive experiments conducted on a publicly available dataset demonstrate the superiority of our proposed model over state-of-the-art methods.
Graph-enhanced large language models (LLMs) leverage LLMs’ remarkable ability to model language and use graph structures to capture topological relationships. Existing graph-enhanced LLMs typically retrieve similar subgraphs to augment LLMs, where the subgraphs carry the entities related to our target and relations among the entities. However, the retrieving methods mainly focus solely on accurately matching subgraphs between our target subgraph and the candidate subgraphs at the same scale, neglecting that the subgraphs with different scales may also share similar semantics or structures. To tackle this challenge, we introduce a graph-enhanced LLM with multi-scale retrieval (MSG-LLM). It captures similar graph structures and semantics across graphs at different scales and bridges the graph alignment across multiple scales. The larger scales maintain the graph’s global information, while the smaller scales preserve the details of fine-grained sub-structures. Specifically, we construct a multi-scale variation to dynamically shrink the scale of graphs. Further, we employ a graph kernel search to discover subgraphs from the entire graph, which essentially achieves multi-scale graph retrieval in Hilbert space. Additionally, we propose to conduct multi-scale interactions (message passing) over graphs at various scales to integrate key information. The interaction also bridges the graph and LLMs, helping with graph retrieval and LLM generation. Finally, we employ a Chain-of-Thought-based LLM prediction to perform the downstream tasks. We evaluate our approach on two graph-based downstream tasks and the experimental results show that our method achieves state-of-the-art performance.
Aspect-based sentiment analysis (ABSA) is a crucial task in information extraction and sentiment analysis, aiming to identify aspects with associated sentiment elements in text. However, existing ABSA datasets are predominantly English-centric, limiting the scope for multilingual evaluation and research. To bridge this gap, we present M-ABSA, a comprehensive dataset spanning 7 domains and 21 languages, making it the most extensive multilingual parallel dataset for ABSA to date. Our primary focus is on triplet extraction, which involves identifying aspect terms, aspect categories, and sentiment polarities. The dataset is constructed through an automatic translation process with human review to ensure quality. We perform extensive experiments using various baselines to assess performance and compatibility on M-ABSA. Our empirical findings highlight that the dataset enables diverse evaluation tasks, such as multilingual and multi-domain transfer learning, and large language model evaluation, underscoring its inclusivity and its potential to drive advancements in multilingual ABSA research.
While text-based emotion recognition methods have achieved notable success, real-world dialogue systems often demand a more nuanced emotional understanding than any single modality can offer. Multimodal Emotion Recognition in Conversations (MERC) has thus emerged as a crucial direction for enhancing the naturalness and emotional understanding of human-computer interaction. Its goal is to accurately recognize emotions by integrating information from various modalities such as text, speech, and visual signals. This survey offers a systematic overview of MERC, including its motivations, core tasks, representative methods, and evaluation strategies. We further examine recent trends, highlight key challenges, and outline future directions. As interest in emotionally intelligent systems grows, this survey provides timely guidance for advancing MERC research.
Argument pair extraction (APE) is a task that aims to extract interactive argument pairs from two argument passages. Generally, existing works focus on either simple argument interaction or task form conversion, instead of thorough deep-level feature exploitation of argument pairs. To address this issue, a Semantics-Aware Dual Graph Convolutional Networks (SADGCN) is proposed for APE. Specifically, the co-occurring word graph is designed to tackle the lexical and semantic relevance of arguments with a pre-trained Rouge-guided Transformer (ROT). Considering the topic relevance in argument pairs, a topic graph is constructed by the neural topic model to leverage the topic information of argument passages. The two graphs are fused via a gating mechanism, which contributes to the extraction of argument pairs. Experimental results indicate that our approach achieves the state-of-the-art performance. The performance on F1 score is significantly improved by 6.56% against the existing best alternative.