2025
pdf
bib
abs
Colloquial Singaporean English Style Transfer with Fine-Grained Explainable Control
Jinggui Liang
|
Dung Vo
|
Yap Hong Xian
|
Hai Leong Chieu
|
Kian Ming A. Chai
|
Jing Jiang
|
Lizi Liao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Colloquial Singaporean English (Singlish) is an informal English marked by a unique blend of languages reflecting Singapore’s multicultural identity. Style transfer between Singlish and Standard (formal) English is vital for various applications, yet existing methods often lack explainability and fine-grained control. To fill this gap, we contribute in two key ways. First, we construct a large, high-quality dataset of formal and informal sentences, annotated across six linguistic aspects—Syntax, Lexical Borrowing, Pragmatics, Prosody/Phonology, Emoticons/Punctuation, and Code-Switching—with detailed explanations. Starting with manually annotated cases, we scaled the dataset to 140K with ensured quality. Second, inspired by the “Society of Mind” theory, we propose a novel multi-agent framework where large language models (LLMs) act as expert agents for each linguistic aspect. These agents collaborate by iteratively generating, critiquing, and refining responses to achieve controlled, explainable style transfer. Both automatic metrics and human evaluations confirm that our method enables precise, interpretable transformations, advancing explainability in NLP for Singlish.
2024
pdf
bib
abs
A Survey of Ontology Expansion for Conversational Understanding
Jinggui Liang
|
Yuxia Wu
|
Yuan Fang
|
Hao Fei
|
Lizi Liao
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
In the rapidly evolving field of conversational AI, Ontology Expansion (OnExp) is crucial for enhancing the adaptability and robustness of conversational agents. Traditional models rely on static, predefined ontologies, limiting their ability to handle new and unforeseen user needs. This survey paper provides a comprehensive review of the state-of-the-art techniques in OnExp for conversational understanding. It categorizes the existing literature into three main areas: (1) New Intent Discovery, (2) New Slot-Value Discovery, and (3) Joint OnExp. By examining the methodologies, benchmarks, and challenges associated with these areas, we highlight several emerging frontiers in OnExp to improve agent performance in real-world scenarios and discuss their corresponding challenges. This survey aspires to be a foundational reference for researchers and practitioners, promoting further exploration and innovation in this crucial domain.
pdf
bib
abs
Synergizing Large Language Models and Pre-Trained Smaller Models for Conversational Intent Discovery
Jinggui Liang
|
Lizi Liao
|
Hao Fei
|
Jing Jiang
Findings of the Association for Computational Linguistics: ACL 2024
In Conversational Intent Discovery (CID), Small Language Models (SLMs) struggle with overfitting to familiar intents and fail to label newly discovered ones. This issue stems from their limited grasp of semantic nuances and their intrinsically discriminative framework. Therefore, we propose Synergizing Large Language Models (LLMs) with pre-trained SLMs for CID (SynCID). It harnesses the profound semantic comprehension of LLMs alongside the operational agility of SLMs. By utilizing LLMs to refine both utterances and existing intent labels, SynCID significantly enhances the semantic depth, subsequently realigning these enriched descriptors within the SLMs’ feature space to correct cluster distortion and promote robust learning of representations. A key advantage is its capacity for the early identification of new intents, a critical aspect for deploying conversational agents successfully. Additionally, SynCID leverages the in-context learning strengths of LLMs to generate labels for new intents. Thorough evaluations across a wide array of datasets have demonstrated its superior performance over traditional CID methods.
pdf
bib
abs
Actively Learn from LLMs with Uncertainty Propagation for Generalized Category Discovery
Jinggui Liang
|
Lizi Liao
|
Hao Fei
|
Bobo Li
|
Jing Jiang
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Generalized category discovery faces a key issue: the lack of supervision for new and unseen data categories. Traditional methods typically combine supervised pretraining with self-supervised learning to create models, and then employ clustering for category identification. However, these approaches tend to become overly tailored to known categories, failing to fully resolve the core issue. Hence, we propose to integrate the feedback from LLMs into an active learning paradigm. Specifically, our method innovatively employs uncertainty propagation to select data samples from high-uncertainty regions, which are then labeled using LLMs through a comparison-based prompting scheme. This not only eases the labeling task but also enhances accuracy in identifying new categories. Additionally, a soft feedback propagation mechanism is introduced to minimize the spread of inaccurate feedback. Experiments on various datasets demonstrate our framework’s efficacy and generalizability, significantly improving baseline models at a nominal average cost.
2023
pdf
bib
abs
ClusterPrompt: Cluster Semantic Enhanced Prompt Learning for New Intent Discovery
Jinggui Liang
|
Lizi Liao
Findings of the Association for Computational Linguistics: EMNLP 2023
The discovery of new intent categories from user utterances is a crucial task in expanding agent skills. The key lies in how to efficiently solicit semantic evidence from utterances and properly transfer knowledge from existing intents to new intents. However, previous methods laid too much emphasis on relations among utterances or clusters for transfer learning, while paying less attention to the usage of semantics. As a result, these methods suffer from in-domain over-fitting and often generate meaningless new intent clusters due to data distortion. In this paper, we present a novel approach called Cluster Semantic Enhanced Prompt Learning (CsePL) for discovering new intents. Our method leverages two-level contrastive learning with label semantic alignment to learn meaningful representations of intent clusters. These learned intent representations are then utilized as soft prompt initializations for discriminating new intents, reducing the dominance of existing intents. Extensive experiments conducted on three public datasets demonstrate the superiority of our proposed method. It not only outperforms existing methods but also suggests meaningful intent labels and enables early detection of new intents.