Ziheng Wang
Papers on this page may belong to the following people: Ziheng Wang, Ziheng Wang
2026
Response-G1: Explicit Scene Graph Modeling for Proactive Streaming Video Understanding
Ke Ma | Jiaqi Tang | Bin Guo | Xueting Han | Ruonan Xu | Qingfeng He | Ziheng Wang | Xu Wang | Qifeng Chen | Zhiwen Yu | Yunhao Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ke Ma | Jiaqi Tang | Bin Guo | Xueting Han | Ruonan Xu | Qingfeng He | Ziheng Wang | Xu Wang | Qifeng Chen | Zhiwen Yu | Yunhao Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Proactive streaming video understanding requires Video-LLMs to decide when to respond as a video unfolds, a task where existing methods often fall short due to their implicit, query-agnostic modeling of visual evidence. We introduce Response-G1, a novel framework that establishes explicit, structured alignment between the accumulated video evidence and the query’s expected response conditions via scene graphs. The framework operates in three fine-tuning-free stages: (1) online query-guided scene graph generation from streaming clips; (2) memory-based retrieval of the most semantically relevant historical scene graphs; and (3) retrieval-augmented trigger prompting for per-frame "silence/response" decisions. By grounding both evidence and conditions in a shared graph representation, Response-G1 achieves more interpretable and accurate response timing decisions. Experimental results on established benchmarks demonstrate the superiority of our method in both proactive and reactive tasks, validating the advantage of explicit scene graph modeling and retrieval in streaming video understanding.
2025
EquiBench: Benchmarking Large Language Models’ Reasoning about Program Semantics via Equivalence Checking
Anjiang Wei | Jiannan Cao | Ran Li | Hongyu Chen | Yuhui Zhang | Ziheng Wang | Yuan Liu | Thiago S. F. X. Teixeira | Diyi Yang | Ke Wang | Alex Aiken
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Anjiang Wei | Jiannan Cao | Ran Li | Hongyu Chen | Yuhui Zhang | Ziheng Wang | Yuan Liu | Thiago S. F. X. Teixeira | Diyi Yang | Ke Wang | Alex Aiken
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
As large language models (LLMs) become integral to code-related tasks, a central question emerges: Do LLMs truly understand program semantics? We introduce EquiBench, a new benchmark for evaluating LLMs through equivalence checking, i.e., determining whether two programs produce identical outputs for all possible inputs. Unlike prior code generation benchmarks, this task directly tests a model’s ability to reason about program semantics. EquiBench consists of 2400 program pairs across four languages and six categories. These pairs are generated through program analysis, compiler scheduling, and superoptimization, ensuring high-confidence labels, nontrivial difficulty, and full automation. We evaluate 19 state-of-the-art LLMs and find that in the most challenging categories, the best accuracies are 63.8% and 76.2%, only modestly above the 50% random baseline. Further analysis reveals that models often rely on syntactic similarity rather than exhibiting robust reasoning about program semantics, highlighting current limitations. Our code and dataset are publicly available at https://github.com/Anjiang-Wei/equibench
2024
Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline
Dingyi Yang | Chunru Zhan | Ziheng Wang | Biao Wang | Tiezheng Ge | Bo Zheng | Qin Jin
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Dingyi Yang | Chunru Zhan | Ziheng Wang | Biao Wang | Tiezheng Ge | Bo Zheng | Qin Jin
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Video storytelling is engaging multimedia content that utilizes video and its accompanying narration to share a story and attract the audience, where a key challenge is creating narrations for recorded visual scenes. Previous studies on dense video captioning and video story generation have made some progress. However, in practical applications, we typically require synchronized narrations for ongoing visual scenes. In this work, we introduce a new task of Synchronized Video Storytelling, which aims to generate synchronous and informative narrations for videos. These narrations, associated with each video clip, should relate to the visual content, integrate relevant knowledge, and have an appropriate word count corresponding to the clip’s duration. Specifically, a structured storyline is beneficial to guide the generation process, ensuring coherence and integrity. To support the exploration of this task, we introduce a new benchmark dataset E-SyncVidStory with rich annotations. Since existing Multimodal LLMs are not effective in addressing this task in one-shot or few-shot settings, we propose a framework named VideoNarrator that can generate a storyline for input videos and simultaneously generate narrations with the guidance of the generated or predefined storyline. We further introduce a set of evaluation metrics to thoroughly assess the generation. Both automatic and human evaluations validate the effectiveness of our approach. Our dataset, codes, and evaluations will be released.
2023
Movie101: A New Movie Understanding Benchmark
Zihao Yue | Qi Zhang | Anwen Hu | Liang Zhang | Ziheng Wang | Qin Jin
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zihao Yue | Qi Zhang | Anwen Hu | Liang Zhang | Ziheng Wang | Qin Jin
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
To help the visually impaired enjoy movies, automatic movie narrating systems are expected to narrate accurate, coherent, and role-aware plots when there are no speaking lines of actors. Existing works benchmark this challenge as a normal video captioning task via some simplifications, such as removing role names and evaluating narrations with ngram-based metrics, which makes it difficult for automatic systems to meet the needs of real application scenarios. To narrow this gap, we construct a large-scale Chinese movie benchmark, named Movie101. Closer to real scenarios, the Movie Clip Narrating (MCN) task in our benchmark asks models to generate role-aware narration paragraphs for complete movie clips where no actors are speaking. External knowledge, such as role information and movie genres, is also provided for better movie understanding. Besides, we propose a new metric called Movie Narration Score (MNScore) for movie narrating evaluation, which achieves the best correlation with human evaluation. Our benchmark also supports the Temporal Narration Grounding (TNG) task to investigate clip localization given text descriptions. For both two tasks, our proposed methods well leverage external knowledge and outperform carefully designed baselines. The dataset and codes are released at https://github.com/yuezih/Movie101.
2022
MovieUN: A Dataset for Movie Understanding and Narrating
Qi Zhang | Zihao Yue | Anwen Hu | Ziheng Wang | Qin Jin
Findings of the Association for Computational Linguistics: EMNLP 2022
Qi Zhang | Zihao Yue | Anwen Hu | Ziheng Wang | Qin Jin
Findings of the Association for Computational Linguistics: EMNLP 2022
Automatic movie narration generation and narration grounding are very important to provide a true movie experience for the blind and visually impaired. To tell the movie story well, it is necessary to mention plot-related details (such as character names) and keep the narrations in a plot coherent. Taking these two points into consideration, we construct a Chinese large-scale video benchmark from 101 movies for Movie Understanding and Narrating (MovieUN) to support the Movie Clip Narrating (MCN) task and Temporal Narration Grounding (TNG) task. We split movies in MovieUN into movie clips according to plots, and pair them with corresponding narrations provided by the movie narrators. Ultimately, the TNG task involves 3,253 long video clips totaling 179 hours. The MCN task contains 33,060 video clips totaling 105 hours. We benchmark state-of-the-art video captioning models and temporal grounding models in MCN and TNG tasks, respectively. Furthermore, to accurately comprehend plots of different characters, we propose methods to incorporate portraits of actors as external knowledge in both tasks. The experiment results demonstrate the effectiveness of our proposed methods. The dataset and codes are released at https://github.com/yuezih/MovieUN.
2020
Structured Pruning of Large Language Models
Ziheng Wang | Jeremy Wohlwend | Tao Lei
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Ziheng Wang | Jeremy Wohlwend | Tao Lei
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Large language models have recently achieved state of the art performance across a wide variety of natural language tasks. Meanwhile, the size of these models and their latency have significantly increased, which makes their usage costly, and raises an interesting question: do language models need to be large? We study this question through the lens of model compression. We present a generic, structured pruning approach by parameterizing each weight matrix using its low-rank factorization, and adaptively removing rank-1 components during training. On language modeling tasks, our structured approach outperforms other unstructured and block-structured pruning baselines at various compression levels, while achieving significant speedups during both training and inference. We also demonstrate that our method can be applied to pruning adaptive word embeddings in large language models, and to pruning the BERT model on several downstream fine-tuning classification benchmarks.
Search
Fix author
Co-authors
- Qin Jin 3
- Anwen Hu 2
- Zihao Yue 2
- Alex Aiken 1
- Jiannan Cao 1
- Hongyu Chen 1
- Qifeng Chen 1
- Tiezheng Ge 1
- Bin Guo 1
- Xueting Han 1
- Qingfeng He 1
- Tao Lei 1
- Ran Li 1
- Yuan Liu 1
- Yunhao Liu 1
- Ke Ma 1
- Jiaqi Tang 1
- Thiago S. F. X. Teixeira 1
- Biao Wang 1
- Ke Wang 1
- Xu Wang 1
- Anjiang Wei 1
- Jeremy Wohlwend 1
- Ruonan Xu 1
- Dingyi Yang 1
- Diyi Yang 1
- Zhiwen Yu 1
- Chunru Zhan 1
- Liang Zhang 1
- Qi Zhang 1
- Qi Zhang 1
- Yuhui Zhang 1
- Bo Zheng 1