Di Wang


IM^2: an Interpretable and Multi-category Integrated Metric Framework for Automatic Dialogue Evaluation
Zhihua Jiang | Guanghui Ye | Dongning Rao | Di Wang | Xin Miao
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Evaluation metrics shine the light on the best models and thus strongly influence the research directions, such as the recently developed dialogue metrics USR, FED, and GRADE. However, most current metrics evaluate the dialogue data as isolated and static because they only focus on a single quality or several qualities. To mitigate the problem, this paper proposes an interpretable, multi-faceted, and controllable framework IM^2 (Interpretable and Multi-category Integrated Metric) to combine a large number of metrics which are good at measuring different qualities. The IM^2 framework first divides current popular dialogue qualities into different categories and then applies or proposes dialogue metrics to measure the qualities within each category and finally generates an overall IM^2 score. An initial version of IM^2 was submitted to the AAAI 2022 Track5.1@DSTC10 challenge and took the 2^nd place on both of the development and test leaderboard. After the competition, we develop more metrics and improve the performance of our model. We compare IM^2 with other 13 current dialogue metrics and experimental results show that IM^2 correlates more strongly with human judgments than any of them on each evaluated dataset.


UniKeyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction
Huanqin Wu | Wei Liu | Lei Li | Dan Nie | Tao Chen | Feng Zhang | Di Wang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction
Shulin Liu | Tao Yang | Tianchi Yue | Feng Zhang | Di Wang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Chinese spelling correction (CSC) is a task to detect and correct spelling errors in texts. CSC is essentially a linguistic problem, thus the ability of language understanding is crucial to this task. In this paper, we propose a Pre-trained masked Language model with Misspelled knowledgE (PLOME) for CSC, which jointly learns how to understand language and correct spelling errors. To this end, PLOME masks the chosen tokens with similar characters according to a confusion set rather than the fixed token “[MASK]” as in BERT. Besides character prediction, PLOME also introduces pronunciation prediction to learn the misspelled knowledge on phonic level. Moreover, phonological and visual similarity knowledge is important to this task. PLOME utilizes GRU networks to model such knowledge based on characters’ phonics and strokes. Experiments are conducted on widely used benchmarks. Our method achieves superior performance against state-of-the-art approaches by a remarkable margin. We release the source code and pre-trained model for further use by the community (https://github.com/liushulinle/PLOME).

Concept-Based Label Embedding via Dynamic Routing for Hierarchical Text Classification
Xuepeng Wang | Li Zhao | Bing Liu | Tao Chen | Feng Zhang | Di Wang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Hierarchical Text Classification (HTC) is a challenging task that categorizes a textual description within a taxonomic hierarchy. Most of the existing methods focus on modeling the text. Recently, researchers attempt to model the class representations with some resources (e.g., external dictionaries). However, the concept shared among classes which is a kind of domain-specific and fine-grained information has been ignored in previous work. In this paper, we propose a novel concept-based label embedding method that can explicitly represent the concept and model the sharing mechanism among classes for the hierarchical text classification. Experimental results on two widely used datasets prove that the proposed model outperforms several state-of-the-art methods. We release our complementary resources (concepts and definitions of classes) for these two datasets to benefit the research on HTC.


Density Matching for Bilingual Word Embedding
Chunting Zhou | Xuezhe Ma | Di Wang | Graham Neubig
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Recent approaches to cross-lingual word embedding have generally been based on linear transformations between the sets of embedding vectors in the two languages. In this paper, we propose an approach that instead expresses the two monolingual embedding spaces as probability densities defined by a Gaussian mixture model, and matches the two densities using a method called normalizing flow. The method requires no explicit supervision, and can be learned with only a seed dictionary of words that have identical strings. We argue that this formulation has several intuitively attractive properties, particularly with the respect to improving robustness and generalization to mappings between difficult language pairs or word pairs. On a benchmark data set of bilingual lexicon induction and cross-lingual word similarity, our approach can achieve competitive or superior performance compared to state-of-the-art published results, with particularly strong results being found on etymologically distant and/or morphologically rich languages.

Automatic Arabic Text Summarization Based on Fuzzy Logic
Lamees Al Qassem | Di Wang | Hassan Barada | Ahmad Al-Rubaie | Nawaf Almoosa
Proceedings of the 3rd International Conference on Natural Language and Speech Processing

Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation
Zhiting Hu | Haoran Shi | Bowen Tan | Wentao Wang | Zichao Yang | Tiancheng Zhao | Junxian He | Lianhui Qin | Di Wang | Xuezhe Ma | Zhengzhong Liu | Xiaodan Liang | Wanrong Zhu | Devendra Sachan | Eric Xing
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We introduce Texar, an open-source toolkit aiming to support the broad set of text generation tasks that transform any inputs into natural language, such as machine translation, summarization, dialog, content manipulation, and so forth. With the design goals of modularity, versatility, and extensibility in mind, Texar extracts common patterns underlying the diverse tasks and methodologies, creates a library of highly reusable modules and functionalities, and allows arbitrary model architectures and algorithmic paradigms. In Texar, model architecture, inference, and learning processes are properly decomposed. Modules at a high concept level can be freely assembled or plugged in/swapped out. Texar is thus particularly suitable for researchers and practitioners to do fast prototyping and experimentation. The versatile toolkit also fosters technique sharing across different text generation tasks. Texar supports both TensorFlow and PyTorch, and is released under Apache License 2.0 at https://www.texar.io.

Knowledge-Enriched Transformer for Emotion Detection in Textual Conversations
Peixiang Zhong | Di Wang | Chunyan Miao
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Messages in human conversations inherently convey emotions. The task of detecting emotions in textual conversations leads to a wide range of applications such as opinion mining in social networks. However, enabling machines to analyze emotions in conversations is challenging, partly because humans often rely on the context and commonsense knowledge to express emotions. In this paper, we address these challenges by proposing a Knowledge-Enriched Transformer (KET), where contextual utterances are interpreted using hierarchical self-attention and external commonsense knowledge is dynamically leveraged using a context-aware affective graph attention mechanism. Experiments on multiple textual conversation datasets demonstrate that both context and commonsense knowledge are consistently beneficial to the emotion detection performance. In addition, the experimental results show that our KET model outperforms the state-of-the-art models on most of the tested datasets in F1 score.


Texar: A Modularized, Versatile, and Extensible Toolbox for Text Generation
Zhiting Hu | Zichao Yang | Tiancheng Zhao | Haoran Shi | Junxian He | Di Wang | Xuezhe Ma | Zhengzhong Liu | Xiaodan Liang | Lianhui Qin | Devendra Singh Chaplot | Bowen Tan | Xingjiang Yu | Eric Xing
Proceedings of Workshop for NLP Open Source Software (NLP-OSS)

We introduce Texar, an open-source toolkit aiming to support the broad set of text generation tasks. Different from many existing toolkits that are specialized for specific applications (e.g., neural machine translation), Texar is designed to be highly flexible and versatile. This is achieved by abstracting the common patterns underlying the diverse tasks and methodologies, creating a library of highly reusable modules and functionalities, and enabling arbitrary model architectures and various algorithmic paradigms. The features make Texar particularly suitable for technique sharing and generalization across different text generation applications. The toolkit emphasizes heavily on extensibility and modularized system design, so that components can be freely plugged in or swapped out. We conduct extensive experiments and case studies to demonstrate the use and advantage of the toolkit.

Tackling Adversarial Examples in QA via Answer Sentence Selection
Yuanhang Ren | Ye Du | Di Wang
Proceedings of the Workshop on Machine Reading for Question Answering

Question answering systems deteriorate dramatically in the presence of adversarial sentences in articles. According to Jia and Liang (2017), the single BiDAF system (Seo et al., 2016) only achieves an F1 score of 4.8 on the ADDANY adversarial dataset. In this paper, we present a method to tackle this problem via answer sentence selection. Given a paragraph of an article and a corresponding query, instead of directly feeding the whole paragraph to the single BiDAF system, a sentence that most likely contains the answer to the query is first selected, which is done via a deep neural network based on TreeLSTM (Tai et al., 2015). Experiments on ADDANY adversarial dataset validate the effectiveness of our method. The F1 score has been improved to 52.3.


Steering Output Style and Topic in Neural Response Generation
Di Wang | Nebojsa Jojic | Chris Brockett | Eric Nyberg
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We propose simple and flexible training and decoding methods for influencing output style and topic in neural encoder-decoder based language generation. This capability is desirable in a variety of applications, including conversational systems, where successful agents need to produce language in a specific style and generate responses steered by a human puppeteer or external knowledge. We decompose the neural generation process into empirically easier sub-problems: a faithfulness model and a decoding method based on selective-sampling. We also describe training and sampling algorithms that bias the generation process with a specific language style restriction, or a topic restriction. Human evaluation results show that our proposed methods are able to to restrict style and topic without degrading output quality in conversational tasks.


A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering
Di Wang | Eric Nyberg
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)


The Language Application Grid
Nancy Ide | James Pustejovsky | Christopher Cieri | Eric Nyberg | Di Wang | Keith Suderman | Marc Verhagen | Jonathan Wright
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The Language Application (LAPPS) Grid project is establishing a framework that enables language service discovery, composition, and reuse and promotes sustainability, manageability, usability, and interoperability of natural language Processing (NLP) components. It is based on the service-oriented architecture (SOA), a more recent, web-oriented version of the “pipeline” architecture that has long been used in NLP for sequencing loosely-coupled linguistic analyses. The LAPPS Grid provides access to basic NLP processing tools and resources and enables pipelining such tools to create custom NLP applications, as well as composite services such as question answering and machine translation together with language resources such as mono- and multi-lingual corpora and lexicons that support NLP. The transformative aspect of the LAPPS Grid is that it orchestrates access to and deployment of language resources and processing functions available from servers around the globe and enables users to add their own language resources, services, and even service grids to satisfy their particular needs.


Automatic Domain Partitioning for Multi-Domain Learning
Di Wang | Chenyan Xiong | William Yang Wang
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing