Li Zhang

Google

Other people with similar names: Li Zhang (AWS), Li Zhang (Birmingham), Li Zhang (Google), Li Zhang (IBM-china), Li Zhang (Nankai), Li Zhang (Newcastle, UK), Li Zhang (State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications), Li Zhang (Teesside University), Li Zhang (China Telecom Research Institute), Li Zhang (UC San Diego), Li Zhang (UK), Li Zhang (University of Pennsylvania), Li Zhang (Wuhan)


Fixing paper assignments

  1. Please select all papers that do not belong to this person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2024

pdf bib
Personalized Video Comment Generation
Xudong Lin | Ali Zare | Shiyuan Huang | Ming-Hsuan Yang | Shih-Fu Chang | Li Zhang
Findings of the Association for Computational Linguistics: EMNLP 2024

Generating personalized responses, particularly in the context of video, poses a unique challenge for language models. This paper introduces the novel task of Personalized Video Comment Generation (PVCG), aiming to predict user comments tailored to both the input video and the user’s comment history, where the user is unseen during the model training process. Unlike existing video captioning tasks that ignores the personalization in the text generation process, we introduce PerVidCom, a new dataset specifically collected for this novel task with diverse personalized comments from YouTube. Recognizing the limitations of existing captioning metrics for evaluating this task, we propose a new automatic metric based on Large Language Models (LLMs) with few-shot in-context learning, named FICL-Score, specifically measuring quality from the aspects of emotion, language style and content relevance. We verify the proposed metric with human evaluations. We establish baselines using prominent Multimodal LLMs (MLLMs), analyze their performance discrepancies through extensive evaluation, and identifies directions for future improvement on this important task. Our research opens up a new direction of personalizing MLLMs and paves the way for future research.