Feng Hu
2026
G-Cap: A Game Character Caption Generator
Yang Yang | Feng Hu | Haiming Zhang | XU Cheng | Gui Zheng | Liang Yao | Wenqi Ren
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yang Yang | Feng Hu | Haiming Zhang | XU Cheng | Gui Zheng | Liang Yao | Wenqi Ren
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While Large Vision-Language Models (LVLMs) have demonstrated remarkable proficiency in image captioning, existing research primarily focuses on real-world scenarios, leaving surreal, highly stylized, and semantically hybrid virtual-world scenarios significantly underexplored. In this work, we introduce Game Character Captioning, a novel task designed to evaluate LVLMs’ capability to perceive and describe game character from the virtual-world. To facilitate evaluation, we establish GC-Bench, a manually annotated benchmark, and propose Graph-F1 to effectively assess performance on this task. Our evaluation reveals that: (1) current state-of-the-art LVLMs, including closed-source giants such as and , struggle to maintain the high performance seen in real-world scenarios; and (2) a notable gap exists between open-source and closed-source models. To bridge this gap, we construct GC-148K, a large-scale dataset generated via a specialized data pipeline, and develop the G-Cap series. Experiments demonstrate that G-Cap series rivals the performance of advanced closed-source models at a lower cost, offering an efficient solution for industrial-grade production environment.
2024
Dynamic Multi-granularity Attribution Network for Aspect-based Sentiment Analysis
Yanjiang Chen | Kai Zhang | Feng Hu | Xianquan Wang | Ruikang Li | Qi Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Yanjiang Chen | Kai Zhang | Feng Hu | Xianquan Wang | Ruikang Li | Qi Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Aspect-based sentiment analysis (ABSA) aims to predict the sentiment polarity of a specific aspect within a given sentence. Most existing methods predominantly leverage semantic or syntactic information based on attention scores, which are susceptible to interference caused by irrelevant contexts and often lack sentiment knowledge at a data-specific level. In this paper, we propose a novel Dynamic Multi-granularity Attribution Network (DMAN) from the perspective of attribution. Initially, we leverage Integrated Gradients to dynamically extract attribution scores for each token, which contain underlying reasoning knowledge for sentiment analysis. Subsequently, we aggregate attribution representations from multiple semantic granularities in natural language, enhancing a profound understanding of the semantics. Finally, we integrate attribution scores with syntactic information to capture the relationships between aspects and their relevant contexts more accurately during the sentence understanding process. Extensive experiments on five benchmark datasets demonstrate the effectiveness of our proposed method.
I-AM-G: Interest Augmented Multimodal Generator for Item Personalization
Xianquan Wang | Likang Wu | Shukang Yin | Zhi Li | Yanjiang Chen | Feng Hu | Yu Su | Qi Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Xianquan Wang | Likang Wu | Shukang Yin | Zhi Li | Yanjiang Chen | Feng Hu | Yu Su | Qi Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
The emergence of personalized generation has made it possible to create texts or images that meet the unique needs of users. Recent advances mainly focus on style or scene transfer based on given keywords. However, in e-commerce and recommender systems, it is almost an untouched area to explore user historical interactions, automatically mine user interests with semantic associations, and create item representations that closely align with user individual interests.In this paper, we propose a brand new framework called **I**nterest-**A**ugmented **M**ultimodal **G**enerator (**I-AM-G**). The framework first extracts tags from the multimodal information of items that the user has interacted with, and the most frequently occurred ones are extracted to rewrite the text description of the item. Then, the framework uses a decoupled text-to-text and image-to-image retriever to search for the top-K similar item text and image embeddings from the item pool. Finally, the Attention module for user interests fuses the retrieved information in a cross-modal manner and further guides the personalized generation process in collaboration with the rewritten text.We conducted extensive and comprehensive experiments to demonstrate that our framework can effectively generate results aligned with user preferences, which potentially provides a new paradigm of **Rewrite and Retrieve** for personalized generation.