Hui Wang


2023

pdf
FEDLEGAL: The First Real-World Federated Learning Benchmark for Legal NLP
Zhuo Zhang | Xiangjing Hu | Jingyuan Zhang | Yating Zhang | Hui Wang | Lizhen Qu | Zenglin Xu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The inevitable private information in legal data necessitates legal artificial intelligence to study privacy-preserving and decentralized learning methods. Federated learning (FL) has merged as a promising technique for multiple participants to collaboratively train a shared model while efficiently protecting the sensitive data of participants. However, to the best of our knowledge, there is no work on applying FL to legal NLP. To fill this gap, this paper presents the first real-world FL benchmark for legal NLP, coined FEDLEGAL, which comprises five legal NLP tasks and one privacy task based on the data from Chinese courts. Based on the extensive experiments on these datasets, our results show that FL faces new challenges in terms of real-world non-IID data. The benchmark also encourages researchers to investigate privacy protection using real-world data in the FL setting, as well as deploying models in resource-constrained scenarios. The code and datasets of FEDLEGAL are available here.

pdf
Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization
Luyao Cheng | Siqi Zheng | Zhang Qinglin | Hui Wang | Yafeng Chen | Qian Chen
Findings of the Association for Computational Linguistics: ACL 2023

Speaker diarization is a classic task in speech processing and is crucial in multi-party scenarios such as meetings and conversations. Current mainstream speaker diarization approaches consider acoustic information only, which result in performance degradation when encountering adverse acoustic environment. In this paper, we propose methods to extract speaker-related information from semantic content in multi-party meetings, which, as we will show, can further benefit speaker diarization. We introduce two sub-tasks, Dialogue Detection and Speaker-Turn Detection, in which we effectively extract speaker information from conversational semantics. We also propose a simple yet effective algorithm to jointly model acoustic and semantic information and obtain speaker-identified texts.Experiments on both AISHELL-4 and AliMeeting datasets show that our method achieves consistent improvements over acoustic-only speaker diarization systems.

2022

pdf
CLLE: A Benchmark for Continual Language Learning Evaluation in Multilingual Machine Translation
Han Zhang | Sheng Zhang | Yang Xiang | Bin Liang | Jinsong Su | Zhongjian Miao | Hui Wang | Ruifeng Xu
Findings of the Association for Computational Linguistics: EMNLP 2022

Continual Language Learning (CLL) in multilingual translation is inevitable when new languages are required to be translated. Due to the lack of unified and generalized benchmarks, the evaluation of existing methods is greatly influenced by experimental design which usually has a big gap from the industrial demands. In this work, we propose the first Continual Language Learning Evaluation benchmark CLLE in multilingual translation. CLLE consists of a Chinese-centric corpus — CN-25 and two CLL tasks — the close-distance language continual learning task and the language family continual learning task designed for real and disparate demands. Different from existing translation benchmarks, CLLE considers several restrictions for CLL, including domain distribution alignment, content overlap, language diversity, and the balance of corpus. Furthermore, we propose a novel framework COMETA based on Constrained Optimization and META-learning to alleviate catastrophic forgetting and dependency on history training data by using a meta-model to retain the important parameters for old languages. Our experiments prove that CLLE is a challenging CLL benchmark and that our proposed method is effective when compared with other strong baselines. Due to the construction of the corpus, the task designing and the evaluation method are independent of the centric language, we also construct and release the English-centric corpus EN-25 to facilitate academic research.

2017

pdf
FuRongWang at SemEval-2017 Task 3: Deep Neural Networks for Selecting Relevant Answers in Community Question Answering
Sheng Zhang | Jiajun Cheng | Hui Wang | Xin Zhang | Pei Li | Zhaoyun Ding
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

We describes deep neural networks frameworks in this paper to address the community question answering (cQA) ranking task (SemEval-2017 task 3). Convolutional neural networks and bi-directional long-short term memory networks are applied in our methods to extract semantic information from questions and answers (comments). In addition, in order to take the full advantage of question-comment semantic relevance, we deploy interaction layer and augmented features before calculating the similarity. The results show that our methods have the great effectiveness for both subtask A and subtask C.

2012

pdf
Identification of Social Acts in Dialogue
David Bracewell | Marc Tomlinson | Hui Wang
Proceedings of COLING 2012

2011

pdf
An Exploration into the Use of Contextual Document Clustering for Cluster Sentiment Analysis
Niall Rooney | Hui Wang | Fiona Browne | Fergal Monaghan | Jann Müller | Alan Sergeant | Zhiwei Lin | Philip Taylor | Vladimir Dobrynin
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf
Lexical Semantics-Syntactic Model for Defining and Subcategorizing Attribute Noun Class
Xiaopeng Bai | Hui Wang
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

2005

pdf
從構式語法理論看漢語詞義研究 (A Construction-Bsed Approach to Chinese Lexical Semantics) [In Chinese]
Hui Wang
International Journal of Computational Linguistics & Chinese Language Processing, Volume 10, Number 4, December 2005: Special Issue on Selected Papers from CLSW-5

2003

pdf
The semantic Knowledge-base of Contemporary Chinese and Its Applications in WSD
Hui Wang | Shiwen Yu
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing

pdf
A Large-scale Lexical Semantic Knowledge-base of Chinese
Hui Wang | Shiwen Yu
Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation

2002

pdf
基於組合特徵的漢語名詞詞義消歧 (A Study on Noun Sense Disambiguation Based on Syntagmatic Features) [In Chinese]
Hui Wang
International Journal of Computational Linguistics & Chinese Language Processing, Volume 7, Number 2, August 2002: Special Issue on Computational Chinese Lexical Semantics