Zeyu Zhang
Other people with similar names: Zeyu Zhang , Zeyu Zhang , Zeyu Zhang
2025
PresentAgent: Multimodal Agent for Presentation Video Generation
Jingwei Shi
|
Zeyu Zhang
|
Biao Wu
|
Yanjie Liang
|
Meng Fang
|
Ling Chen
|
Yang Zhao
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
We present PresentAgent, a multimodal agent that transforms long-form documents into narrated presentation videos. While existing approaches are limited to generating static slides or text summaries, our method advances beyond these limitations by producing fully synchronized visual and spoken content that closely mimics human-style presentations. To achieve this integration, PresentAgent employs a modular pipeline that systematically segments the input document, plans and renders slide-style visual frames, generates contextual spoken narration with large language models and Text-to-Speech models, and seamlessly composes the final video with precise audio-visual alignment. Given the complexity of evaluating such multimodal outputs, we introduce PresentEval, a unified assessment framework powered by Vision-Language Models that comprehensively scores videos across three critical dimensions: content fidelity, visual clarity, and audience comprehension through prompt-based evaluation. Our experimental validation on a curated dataset of 30 document–presentation pairs demonstrates that PresentAgent approaches human-level quality across all evaluation metrics. These results highlight the significant potential of controllable multimodal agents in transforming static textual materials into dynamic, effective, and accessible presentation formats.
Hazards in Daily Life? Enabling Robots to Proactively Detect and Resolve Anomalies
Zirui Song
|
Guangxian Ouyang
|
Meng Fang
|
Hongbin Na
|
Zijing Shi
|
Zhenhao Chen
|
Fu Yujie
|
Zeyu Zhang
|
Shiyu Jiang
|
Miao Fang
|
Ling Chen
|
Xiuying Chen
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Existing household robots have made significant progress in performing routine tasks, such as cleaning floors or delivering objects. However, a key limitation of these robots is their inability to recognize potential problems or dangers in home environments. For example, a child may pick up and ingest medication that has fallen on the floor, posing a serious risk. We argue that household robots should proactively detect such hazards or anomalies within the home, and propose the task of anomaly scenario generation. To accomplish this task, we leverage foundational models instead of relying on manually labeled data to build simulated environments. Specifically, we introduce a multi-agent brainstorming approach, where agents collaborate and generate diverse scenarios covering household hazards, hygiene management, and child safety. These textual task descriptions are then integrated with designed 3D assets to simulate realistic environments. Within these constructed environments, our LLM-based robotic agent learns the necessary skills to proactively discover and handle the proposed anomalies through task decomposition, optimal learning approach selection. We demonstrate that our generated environment outperforms others in terms of task description and scene diversity, ultimately enabling robotic agents to better address potential household hazards.