Yuexin Wu


2022

pdf
Token Dropping for Efficient BERT Pretraining
Le Hou | Richard Yuanzhe Pang | Tianyi Zhou | Yuexin Wu | Xinying Song | Xiaodan Song | Denny Zhou
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Transformer-based models generally allocate the same amount of computation for each token in a given sequence. We develop a simple but effective “token dropping” method to accelerate the pretraining of transformer models, such as BERT, without degrading its performance on downstream tasks. In particular, we drop unimportant tokens starting from an intermediate layer in the model to make the model focus on important tokens more efficiently if with limited computational resource. The dropped tokens are later picked up by the last layer of the model so that the model still produces full-length sequences. We leverage the already built-in masked language modeling (MLM) loss to identify unimportant tokens with practically no computational overhead. In our experiments, this simple approach reduces the pretraining cost of BERT by 25% while achieving similar overall fine-tuning performance on standard downstream tasks.

pdf
Unsupervised Reinforcement Adaptation for Class-Imbalanced Text Classification
Yuexin Wu | Xiaolei Huang
Proceedings of the 11th Joint Conference on Lexical and Computational Semantics

Class imbalance naturally exists when label distributions are not aligned across source and target domains. However, existing state-of-the-art UDA models learn domain-invariant representations across domains and evaluate primarily on class-balanced data. In this work, we propose an unsupervised domain adaptation approach via reinforcement learning that jointly leverages feature variants and imbalanced labels across domains. We experiment with the text classification task for its easily accessible datasets and compare the proposed method with five baselines. Experiments on three datasets prove that our proposed method can effectively learn robust domain-invariant representations and successfully adapt text classifiers on imbalanced classes over domains.

2018

pdf
Unsupervised Cross-lingual Transfer of Word Embedding Spaces
Ruochen Xu | Yiming Yang | Naoki Otani | Yuexin Wu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Cross-lingual transfer of word embeddings aims to establish the semantic mappings among words in different languages by learning the transformation functions over the corresponding word embedding spaces. Successfully solving this problem would benefit many downstream tasks such as to translate text classification models from resource-rich languages (e.g. English) to low-resource languages. Supervised methods for this problem rely on the availability of cross-lingual supervision, either using parallel corpora or bilingual lexicons as the labeled data for training, which may not be available for many low resource languages. This paper proposes an unsupervised learning approach that does not require any cross-lingual labeled data. Given two monolingual word embedding spaces for any language pair, our algorithm optimizes the transformation functions in both directions simultaneously based on distributional matching as well as minimizing the back-translation losses. We use a neural network implementation to calculate the Sinkhorn distance, a well-defined distributional similarity measure, and optimize our objective through back-propagation. Our evaluation on benchmark datasets for bilingual lexicon induction and cross-lingual word similarity prediction shows stronger or competitive performance of the proposed method compared to other state-of-the-art supervised and unsupervised baseline methods over many language pairs.

pdf
Contextual Encoding for Translation Quality Estimation
Junjie Hu | Wei-Cheng Chang | Yuexin Wu | Graham Neubig
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

The task of word-level quality estimation (QE) consists of taking a source sentence and machine-generated translation, and predicting which words in the output are correct and which are wrong. In this paper, propose a method to effectively encode the local and global contextual information for each target word using a three-part neural network approach. The first part uses an embedding layer to represent words and their part-of-speech tags in both languages. The second part leverages a one-dimensional convolution layer to integrate local context information for each target word. The third part applies a stack of feed-forward and recurrent neural networks to further encode the global context in the sentence before making the predictions. This model was submitted as the CMU entry to the WMT2018 shared task on QE, and achieves strong results, ranking first in three of the six tracks.

2014

pdf
Group based Self Training for E-Commerce Product Record Linkage
Xin Zhao | Yuexin Wu | Hongfei Yan | Xiaoming Li
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers