Wei Wang
Other people with similar names: Wei Wang, Wei Wang, Wei Wang, Wei Wang, Wei Wang, Wei Wang, Wei Wang, Wei Wang, Wei Wang, Wei Wang, Wei Wang
Unverified author pages with similar names: Wei Wang
2026
Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
Yifan Yang | Bing Han | Hui Wang | Wei Wang | Ziyang Ma | Long Zhou | Zengrui Jin | Guanrou Yang | Tianrui Wang | Xu Tan | Xie Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yifan Yang | Bing Han | Hui Wang | Wei Wang | Ziyang Ma | Long Zhou | Zengrui Jin | Guanrou Yang | Tianrui Wang | Xu Tan | Xie Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Modeling fine-grained speaking styles remains challenging for language-speech representation pre-training, as existing speech-text models are typically trained with coarse captions or task-specific supervision, and scalable fine-grained style annotations are unavailable. We present FCaps, a large-scale dataset with fine-grained free-text style descriptions, encompassing 47k hours of speech and 19M fine-grained captions annotated via a novel end-to-end pipeline that directly grounds detailed captions in audio, thereby avoiding the error propagation caused by LLM-based rewriting in existing cascaded pipelines. Evaluations using LLM-as-a-judge demonstrate that our annotations surpass existing cascaded annotations in terms of correctness, coverage, and naturalness. Building on FCaps, we propose CLSP, a contrastive language-speech pre-trained model that integrates global and fine-grained supervision, enabling unified representations across multiple granularities. Extensive experiments demonstrate that CLSP learns fine-grained and multi-granular speech-text representations that perform reliably across global and fine-grained speech-text retrieval, zero-shot paralinguistic classification, and speech style similarity scoring, with strong alignment to human judgments. Code and dataset are publicly available at https://github.com/yfyeung/CLSP.
Beyond Sentence-level Labels: Integrating Conversational Context and Personal Experience for Natural Emotional Expression
Haiyang Sun | Chenyang Le | Wei Wang | Leying Zhang | Chuang Li | Bing Han | Chenda Li | Mengxiao Bi | Yanmin Qian
Findings of the Association for Computational Linguistics: ACL 2026
Haiyang Sun | Chenyang Le | Wei Wang | Leying Zhang | Chuang Li | Bing Han | Chenda Li | Mengxiao Bi | Yanmin Qian
Findings of the Association for Computational Linguistics: ACL 2026
Emotional Text-to-Speech aims to synthesize speech with human-like naturalness and expressiveness. However, existing systems rely on sentence-level labels, which fails to capture the subtle nuances of human affect. Based on cognitive appraisal theories, we argue that emotional expression is not generated in isolation but is deeply influenced by speaker’s Personal Experience and the conversational Context.To overcome the information bottleneck inherent in traditional annotations, we present Emotional-Context-Speech, a large-scale, context-aware speech corpus derived from multi-speaker audiobooks. This dataset provides not only transcriptions but also dialogue context, personal experience, open-vocabulary emotion labels, and paralinguistic descriptions.Experimental results demonstrate that TTS model trained using additional context and experience descriptions as inputs, called Emotional-Context-TTS, significantly outperforms existing methods in terms of emotional expression accuracy and naturalness.