Cong Wang

2023

Large language models have demonstrated exceptional language understanding capabilities in many NLP tasks. Sparsely activated mixture-of-experts (MoE) has emerged as a promising solution for scaling models while maintaining a constant number of computational operations. Existing MoE models adopt a fixed gating network where each token is computed by the same number of experts. This contradicts our intuition that the tokens in each sequence vary in terms of their linguistic complexity and, consequently, require different computational costs. Little is discussed in prior research on the trade-off between computation per token and model performance. This paper introduces adaptive gating in MoE, a flexible training strategy that allows tokens to be processed by a variable number of experts based on expert probability distribution. Adaptive gating preserves sparsity while improving training efficiency. We further draw upon curriculum learning to better align the order of training samples and maximize the training time savings. Extensive experiments on diverse NLP tasks show that adaptive gating reduces at most 22.5% training time while maintaining inference quality. Moreover, we conduct a comprehensive analysis of the gating decisions and present our insights on which tokens are inherently difficult to process, depending on the specific language task.

pdf abs
Aggregating Multiple Heuristic Signals as Supervision for Unsupervised Automated Essay Scoring
Cong Wang | Zhiwei Jiang | Yafeng Yin | Zifeng Cheng | Shiping Ge | Qing Gu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Automated Essay Scoring (AES) aims to evaluate the quality score for input essays. In this work, we propose a novel unsupervised AES approach ULRA, which does not require groundtruth scores of essays for training. The core idea of our ULRA is to use multiple heuristic quality signals as the pseudo-groundtruth, and then train a neural AES model by learning from the aggregation of these quality signals. To aggregate these inconsistent quality signals into a unified supervision, we view the AES task as a ranking problem, and design a special Deep Pairwise Rank Aggregation (DPRA) loss for training. In the DPRA loss, we set a learnable confidence weight for each signal to address the conflicts among signals, and train the neural AES model in a pairwise way to disentangle the cascade effect among partial-order pairs. Experiments on eight prompts of ASPA dataset show that ULRA achieves the state-of-the-art performance compared with previous unsupervised methods in terms of both transductive and inductive settings. Further, our approach achieves comparable performance with many existing domain-adapted supervised models, showing the effectiveness of ULRA. The code is available at https://github.com/tenvence/ulra.

2020

pdf abs
基于BiLSTM-CRF的社会突发事件研判方法(Social Emergency Event Judgement based on BiLSTM-CRF)
Huijun Hu (胡慧君) | Cong Wang (王聪) | Jianhua Dai (代建华) | Maofu Liu (刘茂福)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

社会突发事件的分类和等级研判作为应急处置中的一环,其重要性不言而喻。然而,目前研究多数采用人工或规则的方法识别证据进行研判,由于社会突发事件的构成的复杂性和语言描述的灵活性,这对于研判证据识别有很大局限性。本文参考“事件抽取”思想,事件类型和研判证据作为事件中元素,以BiLSTM-CRF方法细粒度的识别,并将二者结合,分类结果作为等级研判的输入,识别出研判证据。最终将识别结果结合注意力机制进行等级研判,通过对研判证据的精准识别从而来增强等级研判的准确性。实验表明,相比人工或规则识别研判证据,本文提出的方法有着更好的鲁棒性,社会突发事件研判时也达到了较好的效果。关键词:事件分类 ;研判证据识别 ;等级研判 ;BiLSTM-CRF

Co-authors

Qing Gu 1

Cong Wang

2023

2020

Co-authors

Venues