Cheng Chen


2021

pdf bib
Learning to Bridge Metric Spaces: Few-shot Joint Learning of Intent Detection and Slot Filling
Yutai Hou | Yongkui Lai | Cheng Chen | Wanxiang Che | Ting Liu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Improving Punctuation Restoration for Speech Transcripts via External Data
Xue-Yong Fu | Cheng Chen | Md Tahmid Rahman Laskar | Shashi Bhushan | Simon Corston-Oliver
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

Automatic Speech Recognition (ASR) systems generally do not produce punctuated transcripts. To make transcripts more readable and follow the expected input format for downstream language models, it is necessary to add punctuation marks. In this paper, we tackle the punctuation restoration problem specifically for the noisy text (e.g., phone conversation scenarios). To leverage the available written text datasets, we introduce a data sampling technique based on an n-gram language model to sample more training data that are similar to our in-domain data. Moreover, we propose a two-stage fine-tuning approach that utilizes the sampled external data as well as our in-domain dataset for models based on BERT. Extensive experiments show that the proposed approach outperforms the baseline with an improvement of 1.12% F1 score.

pdf bib
AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models
Yichun Yin | Cheng Chen | Lifeng Shang | Xin Jiang | Xiao Chen | Qun Liu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Pre-trained language models (PLMs) have achieved great success in natural language processing. Most of PLMs follow the default setting of architecture hyper-parameters (e.g., the hidden dimension is a quarter of the intermediate dimension in feed-forward sub-networks) in BERT. Few studies have been conducted to explore the design of architecture hyper-parameters in BERT, especially for the more efficient PLMs with tiny sizes, which are essential for practical deployment on resource-constrained devices. In this paper, we adopt the one-shot Neural Architecture Search (NAS) to automatically search architecture hyper-parameters. Specifically, we carefully design the techniques of one-shot learning and the search space to provide an adaptive and efficient development way of tiny PLMs for various latency constraints. We name our method AutoTinyBERT and evaluate its effectiveness on the GLUE and SQuAD benchmarks. The extensive experiments show that our method outperforms both the SOTA search-based baseline (NAS-BERT) and the SOTA distillation-based methods (such as DistilBERT, TinyBERT, MiniLM, and MobileBERT). In addition, based on the obtained architectures, we propose a more efficient development method that is even faster than the development of a single PLM. The source code and models will be publicly available upon publication.