Li Yue


2025

pdf bib
CDB: A Unified Framework for Hope Speech Detection Through Counterfactual, Desire and Belief
Tulio Ferreira Leite Da Silva | Gonzalo Freijedo Aduna | Farah Benamara | Alda Mari | Zongmin Li | Li Yue | Jian Su
Findings of the Association for Computational Linguistics: NAACL 2025

Computational modeling of user-generated desires on social media can significantly aid decision-makers across various fields. Initially explored through wish speech,this task has evolved into a nuanced examination of hope speech. To enhance understanding and detection, we propose a novel scheme rooted in formal semantics approaches to modality, capturing both future-oriented hopes through desires and beliefs and the counterfactuality of past unfulfilled wishes and regrets. We manually re-annotated existing hope speech datasets and built a new one which constitutes a new benchmark in the field. We also explore the capabilities of LLMs in automatically detecting hope speech, relying on several prompting strategies. To the best of our knowledge, this is the first attempt towards a language-driven decomposition of the notional category hope and its automatic detection in a unified setting.

2024

pdf bib
Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale
Wenzhen Zheng | Wenbo Pan | Xu Xu | Libo Qin | Li Yue | Ming Zhou
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

In recent years, Large Language Models (LLMs) have made significant strides towards Artificial General Intelligence. However, training these models from scratch requires substantial computational resources and vast amounts of text data. In this paper, we explores an alternative approach to constructing a LLM for a new language by continually pre-training (CPT) from existing pre-trained LLMs, instead of using randomly initialized parameters. Based on parallel experiments on 40 model sizes ranging from 40M to 5B parameters, we find that 1) CPT converges faster and saves significant resources in a scalable manner. 2) CPT adheres to an extended scaling law derived from with a joint data-parameter scaling term. 3) The compute-optimal data-parameter allocation for CPT markedly differs based on our estimated scaling factors. 4) The effectiveness of transfer scale is influenced by training duration and linguistic properties, while robust to data replaying, a method that effectively mitigates catastrophic forgetting in CPT. We hope our findings provide deeper insights into the transferability of LLMs at scale for the research community.