Xiaoyan Yu
2026
Editing the Moving World: Model Editing for Video LLMs
Qian Zhang | Xinye Li | Xiaokai Wu | Junhao Xu | Zhanyue Qin | Qingbin Liu | Junxian Cai | Xi Chen | Bolin Zhang | Zhiying Tu | Dianhui Chu | Xiaoyan Yu | Dianbo Sui
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Qian Zhang | Xinye Li | Xiaokai Wu | Junhao Xu | Zhanyue Qin | Qingbin Liu | Junxian Cai | Xi Chen | Bolin Zhang | Zhiying Tu | Dianhui Chu | Xiaoyan Yu | Dianbo Sui
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Model Editing, also known as knowledge editing, is receiving increasing attention in the field of Large Language Models (LLMs). However, existing model editing approaches predominantly focus on knowledge-level or static visual domains, overlooking dynamic semantics. This paper exploratively applies six representative model editing methods (FT, IKE, MEND, SERAC, MEMIT and AlphaEdit) to Video Large Language Models (Vid-LLMs) and introduces the first benchmark specifically designed for Vid-LLMs editing—VMEB (Vid-LLMs Model Editing Benchmark)—systematically extending model editing research from static modalities to dynamic video scenarios. We position this work as a forward-looking benchmark and a foundational diagnostic study: in the video paradigm, our evaluation dimensions encompass traditional metrics including Reliability, Locality, and Generality, while also introducing a video-specific metric: Robustness. Based on experimental results, we analyze the strengths and limitations of existing model editing approaches, and identify new challenges and research directions for the future development of the model editing field within the context of multimodal and video paradigms. Our benchmark is available at https://github.com/Sakabamrisa/VMEB.
2025
Multi-View Incongruity Learning for Multimodal Sarcasm Detection
Diandian Guo | Cong Cao | Fangfang Yuan | Yanbing Liu | Guangjie Zeng | Xiaoyan Yu | Hao Peng | Philip S. Yu
Proceedings of the 31st International Conference on Computational Linguistics
Diandian Guo | Cong Cao | Fangfang Yuan | Yanbing Liu | Guangjie Zeng | Xiaoyan Yu | Hao Peng | Philip S. Yu
Proceedings of the 31st International Conference on Computational Linguistics
Multimodal sarcasm detection (MSD) is essential for various downstream tasks. Existing MSD methods tend to rely on spurious correlations. These methods often mistakenly prioritize non-essential features yet still make correct predictions, demonstrating poor generalizability beyond training environments. Regarding this phenomenon, this paper undertakes several initiatives. Firstly, we identify two primary causes that lead to the reliance of spurious correlations. Secondly, we address these challenges by proposing a novel method that integrate Multimodal Incongruities via Contrastive Learning (MICL) for multimodal sarcasm detection. Specifically, we first leverage incongruity to drive multi-view learning from three views: token-patch, entity-object, and sentiment. Then, we introduce extensive data augmentation to mitigate the biased learning of the textual modality. Additionally, we construct a test set, SPMSD, which consists potential spurious correlations to evaluate the the model’s generalizability. Experimental results demonstrate the superiority of MICL on benchmark datasets, along with the analyses showcasing MICL’s advancement in mitigating the effect of spurious correlation.
2024
UNO Arena for Evaluating Sequential Decision-Making Capability of Large Language Models
Zhanyue Qin | Haochuan Wang | Deyuan Liu | Ziyang Song | Cunhang Fan | Zhao Lv | Jinlin Wu | Zhen Lei | Zhiying Tu | Dianhui Chu | Xiaoyan Yu | Dianbo Sui
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Zhanyue Qin | Haochuan Wang | Deyuan Liu | Ziyang Song | Cunhang Fan | Zhao Lv | Jinlin Wu | Zhen Lei | Zhiying Tu | Dianhui Chu | Xiaoyan Yu | Dianbo Sui
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Sequential decision-making refers to algorithms that take into account the dynamics of the environment, where early decisions affect subsequent decisions. With large language models (LLMs) demonstrating powerful capabilities between tasks, we can’t help but ask: Can Current LLMs Effectively Make Sequential Decisions? In order to answer this question, we propose the UNO Arena based on the card game UNO to evaluate the sequential decision-making capability of LLMs and explain in detail why we choose UNO. In UNO Arena, We evaluate the sequential decision-making capability of LLMs dynamically with novel metrics based Monte Carlo methods. We set up random players, DQN-based reinforcement learning players, and LLM players (e.g. GPT-4, Gemini-pro) for comparison testing. Furthermore, in order to improve the sequential decision-making capability of LLMs, we propose the TUTRI player, which can involves having LLMs reflect their own actions with the summary of game history and the game strategy. Numerous experiments demonstrate that the TUTRI player achieves a notable breakthrough in the performance of sequential decision-making compared to the vanilla LLM player.
Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent
Xiaoyan Yu | Tongxu Luo | Yifan Wei | Fangyu Lei | Yiming Huang | Hao Peng | Liehuang Zhu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Xiaoyan Yu | Tongxu Luo | Yifan Wei | Fangyu Lei | Yiming Huang | Hao Peng | Liehuang Zhu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Large Language Models (LLMs) have revolutionized open-domain dialogue agents but encounter challenges in multi-character role-playing (MCRP) scenarios. To address the issue, we present Neeko, an innovative framework designed for efficient multiple characters imitation. Neeko employs a dynamic low-rank adapter (LoRA) strategy, enabling it to adapt seamlessly to diverse characters. Our framework breaks down the role-playing process into agent pre-training, multiple characters playing, and character incremental learning, effectively handling both seen and unseen roles. This dynamic approach, coupled with distinct LoRA blocks for each character, enhances Neeko’s adaptability to unique attributes, personalities, and speaking patterns. As a result, Neeko demonstrates superior performance in MCRP over most existing methods, offering more engaging and versatile user interaction experiences.
2023
MenatQA: A New Dataset for Testing the Temporal Comprehension and Reasoning Abilities of Large Language Models
Yifan Wei | Yisong Su | Huanhuan Ma | Xiaoyan Yu | Fangyu Lei | Yuanzhe Zhang | Jun Zhao | Kang Liu
Findings of the Association for Computational Linguistics: EMNLP 2023
Yifan Wei | Yisong Su | Huanhuan Ma | Xiaoyan Yu | Fangyu Lei | Yuanzhe Zhang | Jun Zhao | Kang Liu
Findings of the Association for Computational Linguistics: EMNLP 2023
Large language models (LLMs) have shown nearly saturated performance on many natural language processing (NLP) tasks. As a result, it is natural for people to believe that LLMs have also mastered abilities such as time understanding and reasoning. However, research on the temporal sensitivity of LLMs has been insufficiently emphasized. To fill this gap, this paper constructs Multiple Sensitive Factors Time QA (MenatQA), which encompasses three temporal factors (scope factor, order factor, counterfactual factor) with total 2,853 samples for evaluating the time comprehension and reasoning abilities of LLMs. This paper tests current mainstream LLMs with different parameter sizes, ranging from billions to hundreds of billions. The results show most LLMs fall behind smaller temporal reasoning models with different degree on these factors. In specific, LLMs show a significant vulnerability to temporal biases and depend heavily on the temporal information provided in questions. Furthermore, this paper undertakes a preliminary investigation into potential improvement strategies by devising specific prompts and leveraging external tools. These approaches serve as valuable baselines or references for future research endeavors.
2018
The UIR Uncertainty Corpus for Chinese: Annotating Chinese Microblog Corpus for Uncertainty Identification from Social Media
Binyang Li | Jun Xiang | Le Chen | Xu Han | Xiaoyan Yu | Ruifeng Xu | Tengjiao Wang | Kam-fai Wong
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Binyang Li | Jun Xiang | Le Chen | Xu Han | Xiaoyan Yu | Ruifeng Xu | Tengjiao Wang | Kam-fai Wong
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Search
Fix author
Co-authors
- Dianhui Chu 2
- Fangyu Lei 2
- Hao Peng 2
- Zhanyue Qin 2
- Dianbo Sui 2
- Zhiying Tu 2
- Yifan Wei 2
- Junxian Cai 1
- Cong Cao 1
- Xi Chen 1
- Le Chen 1
- Cunhang Fan 1
- Diandian Guo 1
- Xu Han 1
- Yiming Huang 1
- Zhen Lei 1
- Xinye Li 1
- Binyang Li 1
- Yanbing Liu 1
- Deyuan Liu 1
- Qingbin Liu 1
- Kang Liu 1
- Tongxu Luo 1
- Zhao Lv 1
- Huanhuan Ma 1
- Ziyang Song 1
- Yisong Su 1
- Haochuan Wang 1
- Tengjiao Wang 1
- Kam-Fai Wong 1
- Jinlin Wu 1
- Xiaokai Wu 1
- Jun Xiang 1
- Junhao Xu 1
- Ruifeng Xu (徐睿峰) 1
- Philip S. Yu 1
- Fangfang Yuan 1
- Guangjie Zeng 1
- Qian Zhang 1
- Bolin Zhang 1
- Yuanzhe Zhang 1
- Jun Zhao 1
- Liehuang Zhu 1