Ge Shi

2025

Sequential recommender systems, which leverage historical interactions to deliver targeted recommendations, have been significantly advanced by large language models (LLMs). However, LLM-based generative sequential recommendation often faces two key challenges: the lack of collaborative knowledge and the limited controllability over the generated content. In this paper, we propose a simple Bi-Tuning framework with collaborative information for controllable Large Language Model-based Sequential Recommendation (Laser). Specifically, Bi-Tuning works through incorporating learnable virtual tokens at both the prefix and suffix of the input text, where the prefix tokens enable the adaptation of LLMs with collaborative information, while the suffix token transforms the LLM output into item/user embeddings for similarity comparison, thereby facilitating controllable recommendations. Furthermore, we introduce an MoE-based querying transformer that selectively activates experts to extract relevant information from varying collaborative signals of frozen ID-based recommenders into the prefix, coupled with a multi-task loss function incorporating the MoE load-balancing objective. Finally, a two-phase training strategy is employed to progressively obtain high-quality item and user embeddings through the learnable suffix. Experiments on real-world datasets show that Laser effectively adapts LLMs for sequential recommendation, outperforming state-of-the-art baselines.

pdf bib abs
Mitigating the Discrepancy Between Video and Text Temporal Sequences: A Time-Perception Enhanced Video Grounding method for LLM
Xuefen Li | Bo Wang | Ge Shi | Chong Feng | Jiahao Teng
Proceedings of the 31st International Conference on Computational Linguistics

Existing video LLMs typically excel at capturing the overall description of a video but lack the ability to demonstrate an understanding of temporal dynamics and a fine-grained grasp of localized content within the video. In this paper, we propose a Time-Perception Enhanced Video Grounding via Boundary Perception and Temporal Reasoning aimed at mitigating LLMs’ difficulties in understanding the discrepancies between video and text temporality. Specifically, to address the inherent biases in current datasets, we design a series of boundary-perception tasks to enable LLMs to capture accurate video temporality. To tackle LLMs’ insufficient understanding of temporal information, we develop specialized tasks for boundary perception and temporal relationship reasoning to deepen LLMs’ perception of video temporality. Our experimental results show significant improvements across three datasets: ActivityNet, Charades, and DiDeMo (achieving up to 11.2% improvement on R@0.3), demonstrating the effectiveness of our proposed temporal awareness-enhanced data construction method.

pdf bib abs
gowithnlp at SemEval-2025 Task 10: Leveraging Entity-Centric Chain of Thought and Iterative Prompt Refinement for Multi-Label Classification
Bo Wang | Ruichen Song | Xiangyu Wang | Ge Shi | Linmei Hu | Heyan Huang | Chong Feng
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper presents our system for Subtask 10 of Entity Framing, which focuses on assigning one or more hierarchical roles to named entities in news articles. Our approach iteratively refines prompts and utilizes the Entity-Centric Chain of Thought to complete the task. Specifically, to minimize ambiguity in label definitions, we use the model’s predictions as supervisory signals, iteratively refining the category definitions. Furthermore, to minimize the interference of irrelevant information during inference, we incorporate entity-related information into the CoT framework, allowing the model to focus more effectively on entity-centric reasoning. Our system achieved the highest ranking on the leaderboard in the Russian main role classification and the second in English, with an accuracy of 0.8645 and 0.9362, respectively. We discuss the impact of several components of our multilingual classification approach, highlighting their effectiveness.

2024

Move structures have been studied in English for Specific Purposes (ESP) and English for Academic Purposes (EAP) for decades. However, there are few move annotation corpora for Research Article (RA) abstracts. In this paper, we introduce RAAMove, a comprehensive multi-domain corpus dedicated to the annotation of move structures in RA abstracts. The primary objective of RAAMove is to facilitate move analysis and automatic move identification. This paper provides a thorough discussion of the corpus construction process, including the scheme, data collection, annotation guidelines, and annotation procedures. The corpus is constructed through two stages: initially, expert annotators manually annotate high-quality data; subsequently, based on the human-annotated data, a BERT-based model is employed for automatic annotation with the help of experts’ modification. The result is a large-scale and high-quality corpus comprising 33,988 annotated instances. We also conduct preliminary move identification experiments using the BERT-based model to verify the effectiveness of the proposed corpus and model. The annotated corpus is available for academic research purposes and can serve as essential resources for move analysis, English language teaching and writing, as well as move/discourse-related tasks in Natural Language Processing (NLP).

2023

pdf bib abs
A Hybrid Detection and Generation Framework with Separate Encoders for Event Extraction
Ge Shi | Yunyue Su | Yongliang Ma | Ming Zhou
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

The event extraction task typically consists of event detection and event argument extraction. Most previous work models these two subtasks with shared representation by multiple classification tasks or a unified generative approach. In this paper, we revisit this pattern and propose to use independent encoders to model event detection and event argument extraction, respectively, and use the output of event detection to construct the input of event argument extraction. In addition, we use token-level features to precisely control the fusion between two encoders to achieve joint bridging training rather than directly reusing representations between different tasks. Through a series of careful experiments, we demonstrate the importance of avoiding feature interference of different tasks and the importance of joint bridging training. We achieved competitive results on standard benchmarks (ACE05-E, ACE05-E+, and ERE-EN) and established a solid baseline.

Event extraction aims to recognize pre-defined event triggers and arguments from texts, which suffer from the lack of high-quality annotations. In most NLP applications, involving a large scale of synthetic training data is a practical and effective approach to alleviate the problem of data scarcity. However, when applying to the task of event extraction, recent data augmentation methods often neglect the problem of grammatical incorrectness, structure misalignment, and semantic drifting, leading to unsatisfactory performances. In order to solve these problems, we propose a denoised structure-to-text augmentation framework for event extraction (DAEE), which generates additional training data through the knowledge-based structure-to-text generation model and selects the effective subset from the generated data iteratively with a deep reinforcement learning agent. Experimental results on several datasets demonstrate that the proposed method generates more diverse text representations for event extraction and achieves comparable results with the state-of-the-art.

2022

pdf bib abs
Dynamic Prefix-Tuning for Generative Template-based Event Extraction
Xiao Liu | Heyan Huang | Ge Shi | Bo Wang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We consider event extraction in a generative manner with template-based conditional generation. Although there is a rising trend of casting the task of event extraction as a sequence generation problem with prompts, these generation-based methods have two significant challenges, including using suboptimal prompts and static event type information. In this paper, we propose a generative template-based event extraction method with dynamic prefix (GTEE-DynPref) by integrating context information with type-specific prefixes to learn a context-specific prefix for each context. Experimental results show that our model achieves competitive results with the state-of-the-art classification-based model OneIE on ACE 2005 and achieves the best performances on ERE.Additionally, our model is proven to be portable to new types of events effectively.

2018

Relation Extraction suffers from dramatical performance decrease when training a model on one genre and directly applying it to a new genre, due to the distinct feature distributions. Previous studies address this problem by discovering a shared space across genres using manually crafted features, which requires great human effort. To effectively automate this process, we design a genre-separation network, which applies two encoders, one genre-independent and one genre-shared, to explicitly extract genre-specific and genre-agnostic features. Then we train a relation classifier using the genre-agnostic features on the source genre and directly apply to the target genre. Experiment results on three distinct genres of the ACE dataset show that our approach achieves up to 6.1% absolute F1-score gain compared to previous methods. By incorporating a set of external linguistic features, our approach outperforms the state-of-the-art by 1.7% absolute F1 gain. We make all programs of our model publicly available for research purpose