Hui Wang

2025

Automatic speech recognition (ASR) systems have advanced significantly with models like Whisper, Conformer, and self-supervised frameworks such as Wav2vec 2.0 and HuBERT. However, developing robust ASR models for young children’s speech remains challenging due to differences in pronunciation, tone, and pace compared to adult speech. In this paper, we introduce a new Mandarin speech dataset focused on children aged 3 to 5, addressing the scarcity of resources in this area. The dataset comprises 41.25 hours of speech with carefully crafted manual transcriptions, collected from 397 speakers across various provinces in China, with balanced gender representation. We provide a comprehensive analysis of speaker demographics, speech duration distribution and geographic coverage. Additionally, we evaluate ASR performance on models trained from scratch, such as Conformer, as well as fine-tuned pre-trained models like HuBERT and Whisper, where fine-tuning demonstrates significant performance improvements. Furthermore, we assess speaker verification (SV) on our dataset, showing that, despite the challenges posed by the unique vocal characteristics of young children, the dataset effectively supports both ASR and SV tasks. This dataset is a valuable contribution to Mandarin child speech research and holds potential for applications in educational technology and child-computer interaction. It will be open-source and freely available for all academic purposes.

pdf bib abs
Error Comparison Optimization for Large Language Models on Aspect-Based Sentiment Analysis
Qianlong Wang | Keyang Ding | Hengxin Gao | Hui Wang | Ruifeng Xu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Supervised fine-tuning (SFT) has enabled large language models (LLMs) to exhibit promising performance on various tasks. However, this fine-tuning process only compares current predictions and labels on each sample, yet fails to perceive and understand its error outputs from different degrees, which may potentially produce a large percentage of serious errors. This poses a problem for aspect-based sentiment analysis (ABSA) in that these serious errors bring a greater negative impact than acceptable ones. Humans tend to compare mistakes to understand the varying degrees of mistakes, thus avoiding major bad decisions. Inspired by this, we propose a simple yet effective framework that could perceive and understand the degree of different errors by learning from comparative error pairs. It utilizes the SFT model to yield multiple outputs on each sample and selects acceptable and severe errors based on the acceptable scores. Together with the labels, we construct two comparative error pairs and exploit their calibration losses to optimize parameters. We conduct comprehensive experiments on ABSA datasets to demonstrate the effectiveness of our framework over baselines.

Speaker diarization aims to segment an audio stream into homogeneous partitions based on speaker identity, playing a crucial role in speech comprehension and analysis. Mainstream speaker diarization systems rely only on acoustic information, making the task particularly challenging in complex acoustic environments in real-world applications. Recently, significant efforts have been devoted to audio-visual or audio-semantic multimodal modeling to enhance speaker diarization performance; however, these approaches still struggle to address the complexities of speaker diarization on spontaneous and unstructured multi-party conversations. To fully exploit meaningful dialogue patterns, we propose a novel multimodal approach that jointly utilizes audio, visual, and semantic cues to enhance speaker diarization. Our approach structures visual cues among active speakers and semantic cues in spoken content into a cohesive format known as pairwise constraints, and employs a semi-supervised clustering technique based on pairwise constrained propagation. Extensive experiments conducted on multiple multimodal datasets demonstrate that our approach effectively integrates audio-visual-semantic information into the clustering process for acoustic speaker embeddings and consistently outperforms state-of-the-art speaker diarization methods, while largely preserving the overall system framework.

With the growing deployment of pre-trained models like Transformers on cloud platforms, privacy concerns about model parameters and inference data are intensifying. Existing Privacy-Preserving Transformer Inference (PPTI) frameworks face the “impossible trinity” of balancing privacy, efficiency, and performance: Secure Multi-Party Computation (SMPC)-based approaches ensure strong privacy but suffer from high computational overhead and performance losses; Conversely, permutation-based methods achieve near-plaintext efficiency and accuracy but compromise privacy by exposing sensitive model parameters and intermediate results. Bridging this gap with a single approach presents substantial challenges, motivating the introduction of CENTAUR, a groundbreaking PPTI framework that seamlessly integrates random permutations and SMPC to address the “impossible trinity”. By designing efficient PPTI algorithms tailored to the structural properties of Transformer models, CENTAUR achieves an unprecedented balance among privacy, efficiency, and performance. Our experiments demonstrate CENTAUR’s ability to resist diverse data reconstruction attacks, achieve plaintext-level inference accuracy, and boost inference speed by 5.0~30.4 times, unlocking new possibilities for secure and efficient AI deployment.

pdf bib abs
Investigating noun-noun compound relation representations in autoregressive large language models
Saffron Kendrick | Mark Ormerod | Hui Wang | Barry Devereux
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

This paper uses autoregressive large language models to explore at which points in a given input sentence the semantic information is decodable. Using representational similarity analysis and probing, the results show that autoregressive models are capable of extracting the semantic relation information from a dataset of noun-noun compounds. When considering the effect of processing the head and modifier nouns in context, the extracted representations show greater correlation after processing both constituent nouns in the same sentence. The linguistic properties of the head nouns may influence the ability of LLMs to extract relation information when the head and modifier words are processed separately. Probing suggests that Phi-1 and LLaMA-3.2 are exposed to relation information during training, as they are able to predict the relation vectors for compounds from separate word representations to a similar degree as using compositional compound representations. However, the difference in processing condition for GPT-2 and DeepSeek-R1 indicates that these models are actively processing the contextual semantic relation information of the compound.

pdf bib abs
SVD-GCL: A Noise-Augmented Hybrid Graph Contrastive Learning Framework for Recommendation
Liping Wang | Shichao Li | Hui Wang | Yuyan Gao | Mingyao Wei
Proceedings of the 31st International Conference on Computational Linguistics

Recently, deep graph neural networks (GNNs) have emerged as the predominant architecture for recommender systems based on collaborative filtering. Nevertheless, numerous GNN-based approaches confront challenges such as complex computations and skewed feature distributions, especially with high-dimensional, sparse, and noisy data, making it difficult to accurately capture user preferences. To tackle these issues, we introduce SVD-GCL, a streamlined graph contrastive learning recommendation model based on noise augmentation that integrates truncated singular value decomposition in the feature engineering stage. This hybrid optimization approach reduces the dimensionality and denoises the original data. Through extracting self-supervised signals and gradually adding noise to embeddings in the training phase to enrich data samples, the data sparsity is effectively alleviated. Experimental outcomes on three large public benchmark datasets illustrate that SVD-GCL effectively manages high-dimensional sparse data, remains stable in the presence of noise, and provides significant advantages in computational efficiency, recommendation performance, and robustness.

pdf bib abs
GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models
Zike Yuan | Ming Liu | Hui Wang | Bing Qin
Proceedings of the 31st International Conference on Computational Linguistics

Evaluating the graph comprehension and reasoning abilities of Large Language Models (LLMs) is challenging and often incomplete. Existing benchmarks focus primarily on pure graph understanding, lacking a comprehensive evaluation across all graph types and detailed capability definitions. This paper presents GraCoRe, a benchmark for systematically assessing LLMs’ graph comprehension and reasoning. GraCoRe uses a three-tier hierarchical taxonomy to categorize and test models on pure graph and heterogeneous graphs, subdividing capabilities into 10 distinct areas tested through 19 tasks. Our benchmark includes 11 datasets with 5,140 graphs of varying complexity. We evaluate four closed-source and eight open-source LLMs, conducting thorough analyses from both ability and task perspectives. Key findings reveal that OpenAI o1 model has amazing comprehension and reasoning capabilities, semantic enrichment enhances reasoning performance, node ordering impacts task success, and the ability to process longer texts does not necessarily improve graph comprehension or reasoning.

Reinforcement Learning from Human Feedback (RLHF) is effective for aligning Large Language Models (LLMs) with human preferences. However, RLHF’s complex process limits its ability to continually learn human feedback, making it impractical for real-world applications where the deployed model continuously receives feedback from users. The non-RL-based method, such as Direct Preference Optimization (DPO), is not primitively favorable for Continual Learning (CL). We observe that when combined with Experiment Relay (ER) for CL, DPO tends to significantly widen the gap in the probability of human-preferred and dispreferred responses. Consequently, this diminishes the diversity in model generation, potentially leading to model collapse. To overcome the above challenges, we propose the Continual Optimal Policy Regularization (COPR), a novel non-RL offline method to convert the historical optimal policies into optimization constraints when continually learning new preferences. We first derive a moderate reward function from the pairwise ranking loss and then use the moderate reward to calculate a new sampling distribution to construct novel learning objectives and constraints. We also provide formal proof of the learnability of COPR. The experimental results show that COPR outperforms strong CL baselines on our proposed benchmark, in terms of reward-based, GPT-4 evaluations and human assessment.

pdf bib abs
Mitigating Biases of Large Language Models in Stance Detection with Counterfactual Augmented Calibration
Ang Li | Jingqian Zhao | Bin Liang | Lin Gui | Hui Wang | Xi Zeng | Xingwei Liang | Kam-Fai Wong | Ruifeng Xu
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Stance detection is critical for understanding the underlying position or attitude expressed toward a topic. Large language models (LLMs) have demonstrated significant advancements across various natural language processing tasks including stance detection, however, their performance in stance detection is limited by biases and spurious correlations inherent due to their data-driven nature. Our statistical experiment reveals that LLMs are prone to generate biased stances due to sentiment-stance spurious correlations and preference towards certain individuals and topics. Furthermore, the results demonstrate a strong negative correlation between stance bias and stance detection performance, underscoring the importance of mitigating bias to enhance the utility of LLMs in stance detection. Therefore, in this paper, we propose a Counterfactual Augmented Calibration Network (FACTUAL), which a novel calibration network is devised to calibrate potential bias in the stance prediction of LLMs. Further, to address the challenge of effectively learning bias representations and the difficulty in the generalizability of debiasing, we construct counterfactual augmented data. This approach enhances the calibration network, facilitating the debiasing and out-of-domain generalization. Experimental results on in-target and zero-shot stance detection tasks show that the proposed FACTUAL can effectively mitigate biases of LLMs, achieving state-of-the-art results.

2024

pdf bib abs
A Lifelong Multilingual Multi-granularity Semantic Alignment Approach via Maximum Co-occurrence Probability
Xin Liu | Hongwei Sun | Shaojie Dai | Bo Lv | Youcheng Pan | Hui Wang | Yue Yu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Cross-lingual pre-training methods mask and predict tokens in multilingual text to generalize diverse multilingual information. However, due to the lack of sufficient aligned multilingual resources in the pre-training process, these methods may not fully explore the multilingual correlation of masked tokens, resulting in the limitation of multilingual information interaction. In this paper, we propose a lifelong multilingual multi-granularity semantic alignment approach, which continuously extracts massive aligned linguistic units from noisy data via a maximum co-occurrence probability algorithm. Then, the approach releases a version of the multilingual multi-granularity semantic alignment resource, supporting seven languages, namely English, Czech, German, Russian, Romanian, Hindi and Turkish. Finally, we propose how to use this resource to improve the translation performance on WMT14 18 benchmarks in twelve directions. Experimental results show an average of 0.3 1.1 BLEU improvements in all translation benchmarks. The analysis and discussion also demonstrate the superiority and potential of the proposed approach. The resource used in this work will be publicly available.

pdf bib abs
Gradient Consistency-based Parameter Allocation for Multilingual Neural Machine Translation
Wenshuai Huo | Xiaocheng Feng | Yichong Huang | Chengpeng Fu | Hui Wang | Bing Qin
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Multilingual neural machine translation handles the translation of multiple languages with one unified model. However, this joint-training paradigm incurs the notorious issue of parameter interference, where the model compromises with the language diversity to find a common solution. Recent research has explored avoiding this problem by selecting certain parameters for each language direction from the original model to form language-specific sub-networks. However, determining how many parameters to choose and which parameters to select is still a serious challenge. In this work, we propose an approach called CaPA (Consistency-based Parameter Allocation), which dynamically allocates parameters of appropriate scale to each language direction based on the consistency between the gradient of the individual language and the average gradient. Specifically, CaPA allocates more parameters to languages with higher gradient consistency as these languages tend to have a more positive impact on other languages. Furthermore, considering the varying levels of interference across different parts of the model, we propose an adaptive parameter allocation based on module-level gradient consistency. Experimental results show the correlation between gradient consistency and parameter interference, as well as the effectiveness of our proposed method.

With the growing privacy concerns surrounding natural language understanding (NLU) applications, the need to train high-quality models while safeguarding data privacy has reached unprecedented importance. Federated learning (FL) offers a promising approach to collaborative model training by exchanging model gradients. However, many studies show that eavesdroppers in FL could develop sophisticated data reconstruction attack (DRA) to accurately reconstruct clients’ data from the shared gradients. Regrettably, current DRA methods in federated NLU have been mostly conducted on public datasets, lacking a comprehensive evaluation of real-world privacy datasets. To address this limitation, this paper presents a pioneering study that reexamines the performance of these DRA methods as well as corresponding defense methods. Specifically, we introduce a novel real-world privacy dataset called FedAttack which leads to a significant discovery: existing DRA methods usually fail to accurately recover the original text of real-world privacy data. In detail, the tokens within a recovery sentence are disordered and intertwined with tokens from other sentences in the same training batch. Moreover, our experiments demonstrate that the performance of DRA is also influenced by different languages and domains. By discovering these findings, our work lays a solid foundation for further research into the development of more practical DRA methods and corresponding defenses.

pdf bib abs
Incremental pre-training from smaller language models
Han Zhang | Hui Wang | Ruifeng Xu
Proceedings of the 10th SIGHAN Workshop on Chinese Language Processing (SIGHAN-10)

Large language models have recently become a new learning paradigm and led to state-of-the-art performance across a range of tasks. As explosive open-source pre-trained models are available, it is worth investigating how to better utilize existing models. We propose a simple yet effective method, Incr-Pretrain, for incrementally pre-training language models from smaller well-trained source models. Different layer-wise transfer strategies were introduced for model augmentation including parameter copying, initial value padding, and model distillation. Experiments on multiple zero-shot learning tasks demonstrate satisfying inference performance upon transferring and promising training efficiency during continuing pre-training. Compared to training from scratch, Incr-Pretrain can save up to half the training time to get a similar testing loss.

2023

The inevitable private information in legal data necessitates legal artificial intelligence to study privacy-preserving and decentralized learning methods. Federated learning (FL) has merged as a promising technique for multiple participants to collaboratively train a shared model while efficiently protecting the sensitive data of participants. However, to the best of our knowledge, there is no work on applying FL to legal NLP. To fill this gap, this paper presents the first real-world FL benchmark for legal NLP, coined FEDLEGAL, which comprises five legal NLP tasks and one privacy task based on the data from Chinese courts. Based on the extensive experiments on these datasets, our results show that FL faces new challenges in terms of real-world non-IID data. The benchmark also encourages researchers to investigate privacy protection using real-world data in the FL setting, as well as deploying models in resource-constrained scenarios. The code and datasets of FEDLEGAL are available here.

pdf bib abs
Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization
Luyao Cheng | Siqi Zheng | Zhang Qinglin | Hui Wang | Yafeng Chen | Qian Chen
Findings of the Association for Computational Linguistics: ACL 2023

Speaker diarization is a classic task in speech processing and is crucial in multi-party scenarios such as meetings and conversations. Current mainstream speaker diarization approaches consider acoustic information only, which result in performance degradation when encountering adverse acoustic environment. In this paper, we propose methods to extract speaker-related information from semantic content in multi-party meetings, which, as we will show, can further benefit speaker diarization. We introduce two sub-tasks, Dialogue Detection and Speaker-Turn Detection, in which we effectively extract speaker information from conversational semantics. We also propose a simple yet effective algorithm to jointly model acoustic and semantic information and obtain speaker-identified texts. Experiments on both AISHELL-4 and AliMeeting datasets show that our method achieves consistent improvements over acoustic-only speaker diarization systems.

Unsupervised neural machine translation has recently made remarkable strides, achieving impressive results with the exclusive use of monolingual corpora. Nonetheless, these methods still exhibit fundamental flaws, such as confusing similar words. A straightforward remedy to rectify this drawback is to employ bilingual dictionaries, however, high-quality bilingual dictionaries can be costly to obtain. To overcome this limitation, we propose a method that incorporates images at the word level to augment the lexical mappings. Specifically, our method inserts visual representations into the model, modifying the corresponding embedding layer information. Besides, a visible matrix is adopted to isolate the impact of images on other unrelated words. Experiments on the Multi30k dataset with over 300,000 self-collected images validate the effectiveness in generating more accurate word translation, achieving an improvement of up to +2.81 BLEU score, which is comparable or even superior to using bilingual dictionaries.

2022

Continual Language Learning (CLL) in multilingual translation is inevitable when new languages are required to be translated. Due to the lack of unified and generalized benchmarks, the evaluation of existing methods is greatly influenced by experimental design which usually has a big gap from the industrial demands. In this work, we propose the first Continual Language Learning Evaluation benchmark CLLE in multilingual translation. CLLE consists of a Chinese-centric corpus — CN-25 and two CLL tasks — the close-distance language continual learning task and the language family continual learning task designed for real and disparate demands. Different from existing translation benchmarks, CLLE considers several restrictions for CLL, including domain distribution alignment, content overlap, language diversity, and the balance of corpus. Furthermore, we propose a novel framework COMETA based on Constrained Optimization and META-learning to alleviate catastrophic forgetting and dependency on history training data by using a meta-model to retain the important parameters for old languages. Our experiments prove that CLLE is a challenging CLL benchmark and that our proposed method is effective when compared with other strong baselines. Due to the construction of the corpus, the task designing and the evaluation method are independent of the centric language, we also construct and release the English-centric corpus EN-25 to facilitate academic research.

2017

pdf bib abs
FuRongWang at SemEval-2017 Task 3: Deep Neural Networks for Selecting Relevant Answers in Community Question Answering
Sheng Zhang | Jiajun Cheng | Hui Wang | Xin Zhang | Pei Li | Zhaoyun Ding
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

We describes deep neural networks frameworks in this paper to address the community question answering (cQA) ranking task (SemEval-2017 task 3). Convolutional neural networks and bi-directional long-short term memory networks are applied in our methods to extract semantic information from questions and answers (comments). In addition, in order to take the full advantage of question-comment semantic relevance, we deploy interaction layer and augmented features before calculating the similarity. The results show that our methods have the great effectiveness for both subtask A and subtask C.

2012

pdf bib
Identification of Social Acts in Dialogue
David Bracewell | Marc Tomlinson | Hui Wang
Proceedings of COLING 2012

2011

Hui Wang

2025

2024

2023

2022

2017

2012

2011

2010

2005

2003

2002

Co-authors

Venues