Yuting Wang
2026
From Verbatim to Gist: Distilling Pyramidal Multimodal Memory via Semantic Information Bottleneck for Long-Horizon Video Agents
Niu Lian | Yuting Wang | Hanshu Yao | Jinpeng Wang | Bin Chen | Yaowei Wang | Min Zhang | Shu-Tao Xia
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Niu Lian | Yuting Wang | Hanshu Yao | Jinpeng Wang | Bin Chen | Yaowei Wang | Min Zhang | Shu-Tao Xia
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While multimodal large language models have demonstrated impressive short-term reasoning, they struggle with long-horizon video understanding due to limited context windows and static memory mechanisms that fail to mirror human cognitive efficiency. Existing paradigms typically fall into two extremes: vision-centric methods that incur high latency and redundancy through dense visual accumulation, or text-centric approaches that suffer from detail loss and hallucination via aggressive captioning. To bridge this gap, we propose **MM-Mem**, a pyramidal multimodal memory architecture grounded in *Fuzzy-Trace Theory*. **MM-Mem** structures memory hierarchically into a *Sensory Buffer*, *Episodic Stream*, and *Symbolic Schema*, enabling the progressive distillation of fine-grained perceptual traces (*verbatim*) into high-level semantic schemas (*gist*).Furthermore, to govern the dynamic construction of memory, we derive a Semantic Information Bottleneck objective and introduce SIB-GRPO to optimize the trade-off between memory compression and task-relevant information retention.In inference, we design an entropy-driven top-down memory retrieval strategy.Extensive experiments across 4 benchmarks confirm that **MM-Mem** achieves state-of-the-art performance on both offline and streaming tasks, demonstrating robust generalization and validating the effectiveness of cognition-inspired memory organization.Code and associated configurations are publicly available at ‘https://github.com/EliSpectre/MM-Mem‘.
2022
Euphemism Detection by Transformers and Relational Graph Attention Network
Yuting Wang | Yiyi Liu | Ruqing Zhang | Yixing Fan | Jiafeng Guo
Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)
Yuting Wang | Yiyi Liu | Ruqing Zhang | Yixing Fan | Jiafeng Guo
Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)
Euphemism is a type of figurative language broadly adopted in social media and daily conversations. People use euphemism for politeness or to conceal what they are discussing. Euphemism detection is a challenging task because of its obscure and figurative nature. Even humans may not agree on if a word expresses euphemism. In this paper, we propose to employ bidirectional encoder representations transformers (BERT), and relational graph attention network in order to model the semantic and syntactic relations between the target words and the input sentence. The best performing method of ours reaches a Macro-F1 score of 84.0 on the euphemism detection dataset of the third workshop on figurative language processing shared task 2022.