Tan Lee


2025

pdf bib
PodAgent: A Comprehensive Framework for Podcast Generation
Yujia Xiao | Lei He | Haohan Guo | Feng-Long Xie | Tan Lee
Findings of the Association for Computational Linguistics: ACL 2025

Existing automatic audio generation methods struggle to generate podcast-like audio programs effectively. The key challenges lie in in-depth content generation, appropriate and expressive voice production. This paper proposed PodAgent, a comprehensive framework for creating audio programs. PodAgent 1) generates informative topic-discussion content by designing a Host-Guest-Writer multi-agent collaboration system, 2) builds a voice pool for suitable voice-role matching and 3) utilizes LLM-enhanced speech synthesis method to generate expressive conversational speech. Given the absence of standardized evaluation criteria for podcast-like audio generation, we developed comprehensive assessment guidelines to effectively evaluate the model’s performance. Experimental results demonstrate PodAgent’s effectiveness, significantly surpassing direct GPT-4 generation in topic-discussion dialogue content, achieving an 87.4% voice-matching accuracy, and producing more expressive speech through LLM-guided synthesis. Demo page: https://podcast-agent.github.io/demo/. Source code: https://github.com/yujxx/PodAgent.

2009

pdf bib
Automatic Recognition of Cantonese-English Code-Mixing Speech
Joyce Y. C. Chan | Houwei Cao | P. C. Ching | Tan Lee
International Journal of Computational Linguistics & Chinese Language Processing, Volume 14, Number 3, September 2009

2007

pdf bib
Integrating Complementary Features from Vocal Source and Vocal Tract for Speaker Identification
Nengheng Zheng | Tan Lee | Ning Wang | P. C. Ching
International Journal of Computational Linguistics & Chinese Language Processing, Volume 12, Number 3, September 2007: Special Issue on Invited Papers from ISCSLP 2006

2006

pdf bib
Using Duration Information in Cantonese Connected-Digit Recognition
Yu Zhu | Tan Lee
International Journal of Computational Linguistics & Chinese Language Processing, Volume 11, Number 1, March 2006: Special Issue on Human Computer Speech Processing

pdf bib
Modeling Cantonese Pronunciation Variations for Large-Vocabulary Continuous Speech Recognition
Tan Lee | Patgi Kam | Frank K. Soong
International Journal of Computational Linguistics & Chinese Language Processing, Volume 11, Number 1, March 2006: Special Issue on Human Computer Speech Processing

2001

pdf bib
Design, Compilation and Processing of CUCall: A Set of Cantonese Spoken Language Corpora Collected Over Telephone Networks
W.K. Lo | P.C. Ching | Tan Lee | Helen Meng
Proceedings of Research on Computational Linguistics Conference XIV