Yekun Chai


2023

pdf
ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages
Yekun Chai | Shuohuan Wang | Chao Pang | Yu Sun | Hao Tian | Hua Wu
Findings of the Association for Computational Linguistics: ACL 2023

Software engineers working with the same programming language (PL) may speak different natural languages (NLs) and vice versa, erecting huge barriers to communication and working efficiency. Recent studies have demonstrated the effectiveness of generative pre-training in computer programs, yet they are always English-centric. In this work, we step towards bridging the gap between multilingual NLs and multilingual PLs for large language models (LLMs). We release ERNIE-Code, a unified pre-trained language model for 116 NLs and 6 PLs. We employ two methods for universal cross-lingual pre-training: span-corruption language modeling that learns patterns from monolingual NL or PL; and pivot-based translation language modeling that relies on parallel data of many NLs and PLs. Extensive results show that ERNIE-Code outperforms previous multilingual LLMs for PL or NL across a wide range of end tasks of code intelligence, including multilingual code-to-text, text-to-code, code-to-code, and text-to-text generation. We further show its advantage of zero-shot prompting on multilingual code summarization and text-to-text translation. We release our code and pre-trained checkpoints.

2022

pdf
Clip-Tuning: Towards Derivative-free Prompt Learning with a Mixture of Rewards
Yekun Chai | Shuohuan Wang | Yu Sun | Hao Tian | Hua Wu | Haifeng Wang
Findings of the Association for Computational Linguistics: EMNLP 2022

Derivative-free prompt learning has emerged as a lightweight alternative to prompt tuning, which only requires model inference to optimize the prompts. However, existing work did not take full advantage of the over-parameterized characteristics of large pre-trained language models (PLMs). In this paper, we propose Clip-Tuning, a simple yet effective method that adopts diverse frozen “thinned” networks of PLMs to obtain *a mixture of rewards* and thus advance the derivative-free prompt learning. The thinned networks consist of all the hidden units that survive a stationary dropout strategy, whose inference predictions reflect an ensemble of partial views over prompted training samples. Our method outperforms previous gradient-free prompt learning methods and achieves parity with gradient-based counterparts on seven language understanding benchmarks under few-shot settings.

pdf
Predicate-Argument Based Bi-Encoder for Paraphrase Identification
Qiwei Peng | David Weir | Julie Weeds | Yekun Chai
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Paraphrase identification involves identifying whether a pair of sentences express the same or similar meanings. While cross-encoders have achieved high performances across several benchmarks, bi-encoders such as SBERT have been widely applied to sentence pair tasks. They exhibit substantially lower computation complexity and are better suited to symmetric tasks. In this work, we adopt a bi-encoder approach to the paraphrase identification task, and investigate the impact of explicitly incorporating predicate-argument information into SBERT through weighted aggregation. Experiments on six paraphrase identification datasets demonstrate that, with a minimal increase in parameters, the proposed model is able to outperform SBERT/SRoBERTa significantly. Further, ablation studies reveal that the predicate-argument based component plays a significant role in the performance gain.

pdf
X-PuDu at SemEval-2022 Task 6: Multilingual Learning for English and Arabic Sarcasm Detection
Yaqian Han | Yekun Chai | Shuohuan Wang | Yu Sun | Hongyi Huang | Guanghao Chen | Yitong Xu | Yang Yang
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Detecting sarcasm and verbal irony from people’s subjective statements is crucial to understanding their intended meanings and real sentiments and positions in social scenarios. This paper describes the X-PuDu system that participated in SemEval-2022 Task 6, iSarcasmEval - Intended Sarcasm Detection in English and Arabic, which aims at detecting intended sarcasm in various settings of natural language understanding. Our solution finetunes pre-trained language models, such as ERNIE-M and DeBERTa, under the multilingual settings to recognize the irony from Arabic and English texts. Our system ranked second out of 43, and ninth out of 32 in Task A: one-sentence detection in English and Arabic; fifth out of 22 in Task B: binary multi-label classification in English; first out of 16, and fifth out of 13 in Task C: sentence-pair detection in English and Arabic.

2021

pdf
COIN: Conversational Interactive Networks for Emotion Recognition in Conversation
Haidong Zhang | Yekun Chai
Proceedings of the Third Workshop on Multimodal Artificial Intelligence

Emotion recognition in conversation has received considerable attention recently because of its practical industrial applications. Existing methods tend to overlook the immediate mutual interaction between different speakers in the speaker-utterance level, or apply single speaker-agnostic RNN for utterances from different speakers. We propose COIN, a conversational interactive model to mitigate this problem by applying state mutual interaction within history contexts. In addition, we introduce a stacked global interaction module to capture the contextual and inter-dependency representation in a hierarchical manner. To improve the robustness and generalization during training, we generate adversarial examples by applying the minor perturbations on multimodal feature inputs, unveiling the benefits of adversarial examples for emotion detection. The proposed model empirically achieves the current state-of-the-art results on the IEMOCAP benchmark dataset.

pdf
Counter-Contrastive Learning for Language GANs
Yekun Chai | Haidong Zhang | Qiyue Yin | Junge Zhang
Findings of the Association for Computational Linguistics: EMNLP 2021

Generative Adversarial Networks (GANs) have achieved great success in image synthesis, but have proven to be difficult to generate natural language. Challenges arise from the uninformative learning signals passed from the discriminator. In other words, the poor learning signals limit the learning capacity for generating languages with rich structures and semantics. In this paper, we propose to adopt the counter-contrastive learning (CCL) method to support the generator’s training in language GANs. In contrast to standard GANs that adopt a simple binary classifier to discriminate whether a sample is real or fake, we employ a counter-contrastive learning signal that advances the training of language synthesizers by (1) pulling the language representations of generated and real samples together and (2) pushing apart representations of real samples to compete with the discriminator and thus prevent the discriminator from being overtrained. We evaluate our method on both synthetic and real benchmarks and yield competitive performance compared to previous GANs for adversarial sequence generation.

2020

pdf
Highway Transformer: Self-Gating Enhanced Self-Attentive Networks
Yekun Chai | Shuo Jin | Xinwen Hou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Self-attention mechanisms have made striking state-of-the-art (SOTA) progress in various sequence learning tasks, standing on the multi-headed dot product attention by attending to all the global contexts at different locations. Through a pseudo information highway, we introduce a gated component self-dependency units (SDU) that incorporates LSTM-styled gating units to replenish internal semantic importance within the multi-dimensional latent space of individual representations. The subsidiary content-based SDU gates allow for the information flow of modulated latent embeddings through skipped connections, leading to a clear margin of convergence speed with gradient descent algorithms. We may unveil the role of gating mechanism to aid in the context-based Transformer modules, with hypothesizing that SDU gates, especially on shallow layers, could push it faster to step towards suboptimal points during the optimization process.