Beibei Zhang

2026

Coarse-to-Fine Multimodal Information Selection for Video Speaking Style Recognition with Large Language Models
Beibei Zhang | Yanan Lu | Lin Fen | Tongwei Ren
Findings of the Association for Computational Linguistics: ACL 2026

Video Speaking Style Recognition (VSSR) aims to classify conversation videos into different types, significantly facilitating human interaction understanding. Recent approaches explore the potential of large language models (LLM) in VSSR with a training-free process. However, directly integrating all multimodal data yields suboptimal results, since the great redundancy in visual data can overshadow other valuable multimodal information, such as valuable textual dialogues and critical visual clues. To address this, we propose CFMiS (Coarse-to-Fine Multimodal Information Selection), a novel framework for VSSR that dynamically obtain valuable multimodal data via coarse-to-fine selection, enhancing LLM reasoning for VSSR. Specifically, the core of CFMiS are two cascaded modules: 1) a text-dominant modality selection module firstly selects VSSR-required modalities originating from text-based prediction; and 2) if vision is included in the selected modalities, a visual refinement module iteratively collects VSSR-relevant critical visual clues. The former resolves which modality to utilize, while the latter determines which information to adopt from selected modalities, efficiently alleviating information redundancy. Extensive experiments on multiple datasets prove that CFMiS is highly effective for VSSR, outperforming all existing training-free approaches and most training-based methods.

2024

pdf bib abs

Prototype-based Prompt-Instance Interaction with Causal Intervention for Few-shot Event Detection
Jingyao Tang | Lishuang Li | Hongbin Lu | Xueyang Qin | Beibei Zhang | Haiming Wu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Few-shot Event Detection (FSED) is a meaningful task due to the limited labeled data and expensive manual labeling. Some prompt-based methods are used in FSED. However, these methods require large GPU memory due to the increased length of input tokens caused by concatenating prompts, as well as additional human effort for designing verbalizers. Moreover, they ignore instance and prompt biases arising from the confounding effects between prompts and texts. In this paper, we propose a prototype-based prompt-instance Interaction with causal Intervention (2xInter) model to conveniently utilize both prompts and verbalizers and effectively eliminate all biases. Specifically, 2xInter first presents a Prototype-based Prompt-Instance Interaction (PPII) module that applies an interactive approach for texts and prompts to reduce memory and regards class prototypes as verbalizers to avoid design costs. Next, 2xInter constructs a Structural Causal Model (SCM) to explain instance and prompt biases and designs a Double-View Causal Intervention (DVCI) module to eliminate these biases. Due to limited supervised information, DVCI devises a generation-based prompt adjustment for instance intervention and a Siamese network-based instance contrasting for prompt intervention. Finally, the experimental results show that 2xInter achieves state-of-the-art performance on RAMS and ACE datasets.

pdf bib abs

“Biomedical Event Causal Relation Extraction (BECRE) is an important task in biomedical infor-mation extraction. Existing methods usually use pre-trained language models to learn semanticrepresentations and then predict the event causal relation. However, these methods struggle tocapture sufficient cues in biomedical texts for predicting causal relations. In this paper, we pro-pose a Path Reasoning-based Relation-aware Network (PRRN) to explore deeper cues for causalrelations using reinforcement learning. Specifically, our model reasons the relation paths betweenentity arguments of two events, namely entity relation path, which connects the two biomedicalevents through the multi-hop interactions between entities to provide richer cues for predictingevent causal relations. In PRRN, we design a path reasoning module based on reinforcementlearning and propose a novel reward function to encourage the model to focus on the length andcontextual relevance of entity relation paths. The experimental results on two datasets suggestthat PRRN brings considerable improvements over the state-of-the-art models.Introduction”

Co-authors

Venues

Fix author