Chen Gong
Other people with similar names: Chen Gong
Unverified author pages with similar names: Chen Gong
2026
Locate and Explain: Joint Multimodal Emotion Cause Extraction and Summarization in Conversation
Jikun Wan | Chen Gong | Guohong Fu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jikun Wan | Chen Gong | Guohong Fu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multimodal emotion cause analysis in conversation aims to identify the causes of emotions by leveraging multimodal information. Existing studies mainly formulate this problem as either utterance-level emotion cause extraction, which provides clear cause localization but limited explanation, or multimodal emotion cause generation, which offers fine-grained explanations but lacks explicit traceability to source utterances. Moreover, existing datasets rely heavily on human judgment and lack well-defined structured theoretical guidance, leading to subjective and inconsistent annotations. To address these issues, we introduce joint Multimodal Emotion Cause Extraction and Summarization in conversation (MECES), a new task that simultaneously extracts emotion cause utterances and generates cause summaries, enabling both precise localization and interpretable explanations of emotion cause. We further construct a MECES dataset guided by the Activating Events–Beliefs–Consequences theory from psychology. This dataset consists of 5,787 emotion utterances annotated with causes, comprising 12,231 emotion-cause pairs and 6,040 cause summaries. We also propose an effective end-to-end joint learning approach for MECES task, establishing strong benchmark results for this newly introduced task and dataset.
Think Smart, Not Hard: Difficulty Adaptive Reasoning for Large Audio Language Models
Zhichao Sheng | Shilin Zhou | Chen Gong | Zhenghua Li
Findings of the Association for Computational Linguistics: ACL 2026
Zhichao Sheng | Shilin Zhou | Chen Gong | Zhenghua Li
Findings of the Association for Computational Linguistics: ACL 2026
Large Audio Language Models (LALMs) employing the Chain-of-Thought paradigm have demonstrated remarkable reasoning capabilities. Though different problems naturally require varying depths of reasoning, existing methods often determine whether to perform reasoning, lacking fine-grained mechanisms to adapt reasoning length to problem complexity. As a result, LALMs often adopt a one-size-fits-all reasoning strategy, leading to redundant overthinking for simple tasks and insufficient reasoning for complex ones. In this paper, we conduct an in-depth analysis of LALM reasoning behavior and argue that effective and efficient reasoning should be adaptively aligned with task difficulty. To this end, we propose a difficulty-adaptive reasoning method for LALMs. Specifically, we introduce a reward function that dynamically links reasoning length to the model’s perceived problem difficulty, encouraging shorter reasoning for easy tasks and longer reasoning for more complex ones. Extensive experiments on three datasets demonstrate that our method consistently improves performance while reducing average reasoning length by at least 50%, achieving higher efficiency without sacrificing accuracy.
2025
Multimodal Coreference Resolution for Chinese Social Media Dialogues: Dataset and Benchmark Approach
Xingyu Li | Chen Gong | Guohong Fu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xingyu Li | Chen Gong | Guohong Fu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multimodal coreference resolution (MCR) aims to identify mentions referring to the same entity across different modalities, such as text and visuals, and is essential for understanding multimodal content. In the era of rapidly growing multimodal content and social media, MCR is particularly crucial for interpreting user interactions and bridging text-visual references to improve communication and personalization. However, MCR research for real-world dialogues remains unexplored due to the lack of sufficient data resources. To address this gap, we introduce TikTalkCoref, the first Chinese multimodal coreference dataset for social media in real-world scenarios, derived from the popular Douyin short-video platform. This dataset pairs short videos with corresponding textual dialogues from user comments and includes manually annotated coreference clusters for both person mentions in the text and the coreferential person head regions in the corresponding video frames. We also present an effective benchmark approach for MCR, focusing on the celebrity domain, and conduct extensive experiments on our dataset, providing reliable benchmark results for this newly constructed dataset. We release the TikTalkCoref dataset to facilitate future research on MCR for real-world social media dialogues at https://github.com/lxystaruni/TikTalkCoref.
System Report for CCL25-Eval Task 2: Enhanced Chinese Frame Semantic Parsing with Pre-trained Model and Linguistic Features
Yahui Liu | Ziheng Qiao | Chen Gong | Min Zhang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Yahui Liu | Ziheng Qiao | Chen Gong | Min Zhang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"This paper presents our system submitted to the Chinese Frame Semantic Parsing evaluation task at the 24th China National Conference on Computational Linguistics (CCL2025). For the three subtasks of Frame Identification (FI), Argument Identification (AI), and Role Identification(RI), we utilized a larger Chinese pre-trained model, as the foundation and adopted specific optimization strategies for FI and RI subtasks. Specifically, we incorporated word segmentation structure information and updatable pre-trained target word embeddings in the FI subtask, and explored the use of Focal Loss combined with target word embeddings and word segmentation structure information in the RI subtask. Furthermore, a voting mechanism was employed in both the FI and RI subtasks to enhance performance. Our system ultimately achieved first place on the TestA and second place on the TestB."
Self-Correction Makes LLMs Better Parsers
Ziyan Zhang | Yang Hou | Chen Gong | Zhenghua Li
Findings of the Association for Computational Linguistics: EMNLP 2025
Ziyan Zhang | Yang Hou | Chen Gong | Zhenghua Li
Findings of the Association for Computational Linguistics: EMNLP 2025
Large language models (LLMs) have achieved remarkable success across various natural language processing (NLP) tasks. However, recent studies suggest that they still face challenges in performing fundamental NLP tasks essential for deep language understanding, particularly syntactic parsing. In this paper, we conduct an in-depth analysis of LLM parsing capabilities, delving into the underlying causes of why LLMs struggle with this task and the specific shortcomings they exhibit. We find that LLMs may be limited in their ability to fully leverage grammar rules from existing treebanks, restricting their capability to generate syntactic structures. To help LLMs acquire knowledge without additional training, we propose a self-correction method that leverages grammar rules from existing treebanks to guide LLMs in correcting previous errors. Specifically, we automatically detect potential errors and dynamically search for relevant rules, offering hints and examples to guide LLMs in making corrections themselves. Experimental results on three datasets using various LLMs demonstrate that our method significantly improves performance in both in-domain and cross-domain settings.