Lijie Wang


A Fine-grained Interpretability Evaluation Benchmark for Neural NLP
Lijie Wang | Yaozong Shen | Shuyuan Peng | Shuai Zhang | Xinyan Xiao | Hao Liu | Hongxuan Tang | Ying Chen | Hua Wu | Haifeng Wang
Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)

While there is increasing concern about the interpretability of neural models, the evaluation of interpretability remains an open problem, due to the lack of proper evaluation datasets and metrics. In this paper, we present a novel benchmark to evaluate the interpretability of both neural models and saliency methods. This benchmark covers three representative NLP tasks: sentiment analysis, textual similarity and reading comprehension, each provided with both English and Chinese annotated data. In order to precisely evaluate the interpretability, we provide token-level rationales that are carefully annotated to be sufficient, compact and comprehensive. We also design a new metric, i.e., the consistency between the rationales before and after perturbations, to uniformly evaluate the interpretability on different types of tasks. Based on this benchmark, we conduct experiments on three typical models with three saliency methods, and unveil their strengths and weakness in terms of interpretability. We will release this benchmark ( and hope it can facilitate the research in building trustworthy systems.


Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing
Kun Wu | Lijie Wang | Zhenghua Li | Ao Zhang | Xinyan Xiao | Hua Wu | Min Zhang | Haifeng Wang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Data augmentation has attracted a lot of research attention in the deep learning era for its ability in alleviating data sparseness. The lack of labeled data for unseen evaluation databases is exactly the major challenge for cross-domain text-to-SQL parsing. Previous works either require human intervention to guarantee the quality of generated data, or fail to handle complex SQL queries. This paper presents a simple yet effective data augmentation framework. First, given a database, we automatically produce a large number of SQL queries based on an abstract syntax tree grammar. For better distribution matching, we require that at least 80% of SQL patterns in the training data are covered by generated queries. Second, we propose a hierarchical SQL-to-question generation model to obtain high-quality natural language questions, which is the major contribution of this work. Finally, we design a simple sampling strategy that can greatly improve training efficiency given large amounts of generated data. Experiments on three cross-domain datasets, i.e., WikiSQL and Spider in English, and DuSQL in Chinese, show that our proposed data augmentation framework can consistently improve performance over strong baselines, and the hierarchical generation component is the key for the improvement.


LIT Team’s System Description for Japanese-Chinese Machine Translation Task in IWSLT 2020
Yimeng Zhuang | Yuan Zhang | Lijie Wang
Proceedings of the 17th International Conference on Spoken Language Translation

This paper describes the LIT Team’s submission to the IWSLT2020 open domain translation task, focusing primarily on Japanese-to-Chinese translation direction. Our system is based on the organizers’ baseline system, but we do more works on improving the Transform baseline system by elaborate data pre-processing. We manage to obtain significant improvements, and this paper aims to share some data processing experiences in this translation task. Large-scale back-translation on monolingual corpus is also investigated. In addition, we also try shared and exclusive word embeddings, compare different granularity of tokens like sub-word level. Our Japanese-to-Chinese translation system achieves a performance of BLEU=34.0 and ranks 2nd among all participating systems.

DuSQL: A Large-Scale and Pragmatic Chinese Text-to-SQL Dataset
Lijie Wang | Ao Zhang | Kun Wu | Ke Sun | Zhenghua Li | Hua Wu | Min Zhang | Haifeng Wang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Due to the lack of labeled data, previous research on text-to-SQL parsing mainly focuses on English. Representative English datasets include ATIS, WikiSQL, Spider, etc. This paper presents DuSQL, a larges-scale and pragmatic Chinese dataset for the cross-domain text-to-SQL task, containing 200 databases, 813 tables, and 23,797 question/SQL pairs. Our new dataset has three major characteristics. First, by manually analyzing questions from several representative applications, we try to figure out the true distribution of SQL queries in real-life needs. Second, DuSQL contains a considerable proportion of SQL queries involving row or column calculations, motivated by our analysis on the SQL query distributions. Finally, we adopt an effective data construction framework via human-computer collaboration. The basic idea is automatically generating SQL queries based on the SQL grammar and constrained by the given database. This paper describes in detail the construction process and data statistics of DuSQL. Moreover, we present and compare performance of several open-source text-to-SQL parsers with minor modification to accommodate Chinese, including a simple yet effective extension to IRNet for handling calculation SQL queries.


End-to-end Speech Translation System Description of LIT for IWSLT 2019
Mei Tu | Wei Liu | Lijie Wang | Xiao Chen | Xue Wen
Proceedings of the 16th International Conference on Spoken Language Translation

This paper describes our end-to-end speech translation system for the speech translation task of lectures and TED talks from English to German for IWSLT Evaluation 2019. We propose layer-tied self-attention for end-to-end speech translation. Our method takes advantage of sharing weights of speech encoder and text decoder. The representation of source speech and the representation of target text are coordinated layer by layer, so that the speech and text can learn a better alignment during the training procedure. We also adopt data augmentation to enhance the parallel speech-text corpus. The En-De experimental results show that our best model achieves 17.68 on tst2015. Our ASR achieves WER of 6.6% on TED-LIUM test set. The En-Pt model can achieve about 11.83 on the MuST-C dev set.