Li Liu
Other people with similar names: Li Liu
Unverified author pages with similar names: Li Liu
2026
CAGenMol: Condition-Aware Diffusion Language Model for Goal-Directed Molecular Generation
Yanting Li | Zhuoyang Jiang | Enyan Dai | Lei Wang | Wen-Cai Ye | Li Liu
Findings of the Association for Computational Linguistics: ACL 2026
Yanting Li | Zhuoyang Jiang | Enyan Dai | Lei Wang | Wen-Cai Ye | Li Liu
Findings of the Association for Computational Linguistics: ACL 2026
Goal-directed molecular generation requires satisfying heterogeneous constraints such as protein–ligand compatibility and multi-objective drug-like properties, yet existing methods often optimize these constraints in isolation, failing to reconcile conflicting objectives (e.g., affinity vs. safety), and struggle to navigate the non-differentiable chemical space without compromising structural validity. To address these challenges, we propose CAGenMol, a condition-aware discrete diffusion framework over molecular sequences that formulates molecular design as conditional denoising guided by heterogeneous structural and property signals. By coupling discrete diffusion with reinforcement learning, the model aligns the generation trajectory with non-differentiable objectives while preserving chemical validity and diversity. The non-autoregressive nature of diffusion language model further enables iterative refinement of molecular fragments at inference time. Experiments on structure-conditioned, property-conditioned, and dual-conditioned benchmarks demonstrate consistent improvements over state-of-the-art methods in binding affinity, drug-likeness, and success rate, highlighting the effectiveness of our framework. The code is available at https://github.com/Lee612-1/CAGenMol.
2025
Towards Controllable Speech Synthesis in the Era of Large Language Models: A Systematic Survey
Tianxin Xie | Yan Rong | Pengfei Zhang | Wenwu Wang | Li Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Tianxin Xie | Yan Rong | Pengfei Zhang | Wenwu Wang | Li Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Text-to-speech (TTS) has advanced from generating natural-sounding speech to enabling fine-grained control over attributes like emotion, timbre, and style. Driven by rising industrial demand and breakthroughs in deep learning, e.g., diffusion and large language models (LLMs), controllable TTS has become a rapidly growing research area. This survey provides **the first** comprehensive review of controllable TTS methods, from traditional control techniques to emerging approaches using natural language prompts. We categorize model architectures, control strategies, and feature representations, while also summarizing challenges, datasets, and evaluations in controllable TTS. This survey aims to guide researchers and practitioners by offering a clear taxonomy and highlighting future directions in this fast-evolving field. One can visit https://github.com/imxtx/awesome-controllabe-speech-synthesis for a comprehensive paper list and updates.
Orchestrating Audio: Multi-Agent Framework for Long-Video Audio Synthesis
Yehang Zhang | Xinli Xu | Xiaojie Xu | Doudou Zhang | Li Liu | Ying-Cong Chen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yehang Zhang | Xinli Xu | Xiaojie Xu | Doudou Zhang | Li Liu | Ying-Cong Chen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Video-to-audio synthesis, which generates synchronized audio for visual content, critically enhances viewer immersion and narrative coherence in film and interactive media. However, video-to-audio dubbing for long-form content remains an unsolved challenge due to dynamic semantic shifts, audio diversity and the absence of dedicated datasets. While existing methods excel in short videos, they falter in long scenarios (e.g., movies) due to fragmented synthesis and inadequate cross-scene consistency. We propose LVAS-Agent, a multi-agent framework that offers a coordinated, multi-component approach to long-video audio generation. Our approach decomposes long-video synthesis into four steps including scene segmentation, script generation, audio design and audio synthesis. To enable systematic evaluation, we introduce LVAS-Bench, the first benchmark with 207 professionally curated long videos spanning diverse scenarios. Experiments show that our method outperforms state-of-the-art V2A models in overall audio synthesis quality.