Yong Zhang


2022

pdf
E-LANG: Energy-Based Joint Inferencing of Super and Swift Language Models
Mohammad Akbari | Amin Banitalebi-Dehkordi | Yong Zhang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Building huge and highly capable language models has been a trend in the past years. Despite their great performance, they incur high computational cost. A common solution is to apply model compression or choose light-weight architectures, which often need a separate fixed-size model for each desirable computational budget, and may lose performance in case of heavy compression. This paper proposes an effective dynamic inference approach, called E-LANG, which distributes the inference between large accurate Super-models and light-weight Swift models. To this end, a decision making module routes the inputs to Super or Swift models based on the energy characteristics of the representations in the latent space. This method is easily adoptable and architecture agnostic. As such, it can be applied to black-box pre-trained models without a need for architectural manipulations, reassembling of modules, or re-training. Unlike existing methods that are only applicable to encoder-only backbones and classification tasks, our method also works for encoder-decoder structures and sequence-to-sequence tasks such as translation. The E-LANG performance is verified through a set of experiments with T5 and BERT backbones on GLUE, SuperGLUE, and WMT. In particular, we outperform T5-11B with an average computations speed-up of 3.3X on GLUE and 2.9X on SuperGLUE. We also achieve BERT-based SOTA on GLUE with 3.2X less computations. Code and demo are available in supplementary materials.

pdf
Unsupervised Sentence Textual Similarity with Compositional Phrase Semantics
Zihao Wang | Jiaheng Dou | Yong Zhang
Proceedings of the 29th International Conference on Computational Linguistics

Measuring Sentence Textual Similarity (STS) is a classic task that can be applied to many downstream NLP applications such as text generation and retrieval. In this paper, we focus on unsupervised STS that works on various domains but only requires minimal data and computational resources. Theoretically, we propose a light-weighted Expectation-Correction (EC) formulation for STS computation. EC formulation unifies unsupervised STS approaches including the cosine similarity of Additively Composed (AC) sentence embeddings, Optimal Transport (OT), and Tree Kernels (TK). Moreover, we propose the Recursive Optimal Transport Similarity (ROTS) algorithm to capture the compositional phrase semantics by composing multiple recursive EC formulations. ROTS finishes in linear time and is faster than its predecessors. ROTS is empirically more effective and scalable than previous approaches. Extensive experiments on 29 STS tasks under various settings show the clear advantage of ROTS over existing approaches. Detailed ablation studies prove the effectiveness of our approaches.

2021

pdf
Generating Self-Contained and Summary-Centric Question Answer Pairs via Differentiable Reward Imitation Learning
Li Zhou | Kevin Small | Yong Zhang | Sandeep Atluri
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Motivated by suggested question generation in conversational news recommendation systems, we propose a model for generating question-answer pairs (QA pairs) with self-contained, summary-centric questions and length-constrained, article-summarizing answers. We begin by collecting a new dataset of news articles with questions as titles and pairing them with summaries of varying length. This dataset is used to learn a QA pair generation model producing summaries as answers that balance brevity with sufficiency jointly with their corresponding questions. We then reinforce the QA pair generation process with a differentiable reward function to mitigate exposure bias, a common problem in natural language generation. Both automatic metrics and human evaluation demonstrate these QA pairs successfully capture the central gists of the articles and achieve high answer accuracy.

2020

pdf
A Relaxed Matching Procedure for Unsupervised BLI
Xu Zhao | Zihao Wang | Yong Zhang | Hao Wu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recently unsupervised Bilingual Lexicon Induction(BLI) without any parallel corpus has attracted much research interest. One of the crucial parts in methods for the BLI task is the matching procedure. Previous works impose a too strong constraint on the matching and lead to many counterintuitive translation pairings. Thus We propose a relaxed matching procedure to find a more precise matching between two languages. We also find that aligning source and target language embedding space bidirectionally will bring significant improvement. We follow the previous iterative framework to conduct experiments. Results on standard benchmark demonstrate the effectiveness of our proposed method, which substantially outperforms previous unsupervised methods.

pdf
Semi-Supervised Bilingual Lexicon Induction with Two-way Interaction
Xu Zhao | Zihao Wang | Hao Wu | Yong Zhang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Semi-supervision is a promising paradigm for Bilingual Lexicon Induction (BLI) with limited annotations. However, previous semisupervised methods do not fully utilize the knowledge hidden in annotated and nonannotated data, which hinders further improvement of their performance. In this paper, we propose a new semi-supervised BLI framework to encourage the interaction between the supervised signal and unsupervised alignment. We design two message-passing mechanisms to transfer knowledge between annotated and non-annotated data, named prior optimal transport and bi-directional lexicon update respectively. Then, we perform semi-supervised learning based on a cyclic or a parallel parameter feeding routine to update our models. Our framework is a general framework that can incorporate any supervised and unsupervised BLI methods based on optimal transport. Experimental results on MUSE and VecMap datasets show significant improvement of our models. Ablation study also proves that the two-way interaction between the supervised signal and unsupervised alignment accounts for the gain of the overall performance. Results on distant language pairs further illustrate the advantage and robustness of our proposed method.

2017

pdf
MainiwayAI at IJCNLP-2017 Task 2: Ensembles of Deep Architectures for Valence-Arousal Prediction
Yassine Benajiba | Jin Sun | Yong Zhang | Zhiliang Weng | Or Biran
Proceedings of the IJCNLP 2017, Shared Tasks

This paper introduces Mainiway AI Labs submitted system for the IJCNLP 2017 shared task on Dimensional Sentiment Analysis of Chinese Phrases (DSAP), and related experiments. Our approach consists of deep neural networks with various architectures, and our best system is a voted ensemble of networks. We achieve a Mean Absolute Error of 0.64 in valence prediction and 0.68 in arousal prediction on the test set, both placing us as the 5th ranked team in the competition.

pdf
The Sentimental Value of Chinese Sub-Character Components
Yassine Benajiba | Or Biran | Zhiliang Weng | Yong Zhang | Jin Sun
Proceedings of the 9th SIGHAN Workshop on Chinese Language Processing

Sub-character components of Chinese characters carry important semantic information, and recent studies have shown that utilizing this information can improve performance on core semantic tasks. In this paper, we hypothesize that in addition to semantic information, sub-character components may also carry emotional information, and that utilizing it should improve performance on sentiment analysis tasks. We conduct a series of experiments on four Chinese sentiment data sets and show that we can significantly improve the performance in various tasks over that of a character-level embeddings baseline. We then focus on qualitatively assessing multiple examples and trying to explain how the sub-character components affect the results in each case.