Weiyao Luo


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Beyond Single Frames: Can LMMs Comprehend Implicit Narratives in Comic Strip?
Xiaochen Wang | Heming Xia | Jialin Song | Longyu Guan | Qingxiu Dong | Rui Li | Yixin Yang | Yifan Pu | Weiyao Luo | Yiru Wang | Xiangdi Meng | Wenjie Li | Zhifang Sui
Findings of the Association for Computational Linguistics: EMNLP 2025

Large Multimodal Models (LMMs) have demonstrated strong performance on vision-language benchmarks, yet current evaluations predominantly focus on single-image reasoning. In contrast, real-world scenarios always involve understanding sequences of images. A typical scenario is comic strips understanding, which requires models to perform nuanced visual reasoning beyond surface-level recognition. To address this gap, we introduce STRIPCIPHER , a benchmark designed to evaluate the model ability on understanding implicit narratives in silent comics. STRIPCIPHER is a high-quality, human-annotated dataset featuring fine-grained annotations and comprehensive coverage of varying difficulty levels. It comprises three tasks: visual narrative comprehension, contextual frame prediction, and temporal narrative reordering. % , covering various difficulty. Notably, evaluation results on STRIPCIPHER reveals a significant gap between current LMMs and human performance—e.g., GPT-4o achieves only 23.93% accuracy in the reordering task, 56.07% below human levels. These findings underscore the limitations of current LMMs in implicit visual narrative understanding and highlight opportunities for advancing sequential multimodal reasoning.

2024

pdf bib
Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens
Weiyao Luo | Suncong Zheng | Heming Xia | Weikang Wang | Yan Lei | Tianyu Liu | Shuang Chen | Zhifang Sui
Findings of the Association for Computational Linguistics: EMNLP 2024

Large language models (LLMs) have shown promising efficacy across various tasks, becoming powerful tools in numerous aspects of human life. However, Transformer-based LLMs suffer a performance degradation when modeling long-term contexts due to they discard some information to reduce computational overhead. In this work, we propose a simple yet effective method to enable LLMs to take a deep breath, encouraging them to summarize information contained within discrete text chunks. Specifically, we segment the text into multiple chunks and insert special token <SR> at the end of each chunk. We then modify the attention mask to integrate the chunk’s information into the corresponding <SR> token. This facilitates LLMs to interpret information not only from historical individual tokens but also from the <SR> token, aggregating the chunk’s semantic information. Experiments on language modeling and out-of-domain downstream tasks validate the superiority of our approach.

pdf bib
FaGANet: An Evidence-Based Fact-Checking Model with Integrated Encoder Leveraging Contextual Information
Weiyao Luo | Junfeng Ran | Zailong Tian | Sujian Li | Zhifang Sui
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In the face of the rapidly growing spread of false and misleading information in the real world, manual evidence-based fact-checking efforts become increasingly challenging and time-consuming. In order to tackle this issue, we propose FaGANet, an automated and accurate fact-checking model that leverages the power of sentence-level attention and graph attention network to enhance performance. This model adeptly integrates encoder-only models with graph attention network, effectively fusing claims and evidence information for accurate identification of even well-disguised data. Experiment results showcase the significant improvement in accuracy achieved by our FaGANet model, as well as its state-of-the-art performance in the evidence-based fact-checking task. We release our code and data in https://github.com/WeiyaoLuo/FaGANet.