Wen-Mei Hwu

Also published as: Wen-mei Hwu


2021

pdf bib
Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach
Jie Huang | Kevin Chang | JinJun Xiong | Wen-mei Hwu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We propose to measure fine-grained domain relevance– the degree that a term is relevant to a broad (e.g., computer science) or narrow (e.g., deep learning) domain. Such measurement is crucial for many downstream tasks in natural language processing. To handle long-tail terms, we build a core-anchored semantic graph, which uses core terms with rich description information to bridge the vast remaining fringe terms semantically. To support a fine-grained domain without relying on a matching corpus for supervision, we develop hierarchical core-fringe learning, which learns core and fringe terms jointly in a semi-supervised manner contextualized in the hierarchy of the domain. To reduce expensive human efforts, we employ automatic annotation and hierarchical positive-unlabeled learning. Our approach applies to big or small domains, covers head or tail terms, and requires little human effort. Extensive experiments demonstrate that our methods outperform strong baselines and even surpass professional human performance.

pdf bib
xER: An Explainable Model for Entity Resolution using an Efficient Solution for the Clique Partitioning Problem
Samhita Vadrevu | Rakesh Nagi | JinJun Xiong | Wen-mei Hwu
Proceedings of the First Workshop on Trustworthy Natural Language Processing

In this paper, we propose a global, self- explainable solution to solve a prominent NLP problem: Entity Resolution (ER). We formu- late ER as a graph partitioning problem. Every mention of a real-world entity is represented by a node in the graph, and the pairwise sim- ilarity scores between the mentions are used to associate these nodes to exactly one clique, which represents a real-world entity in the ER domain. In this paper, we use Clique Partition- ing Problem (CPP), which is an Integer Pro- gram (IP) to formulate ER as a graph partition- ing problem and then highlight the explainable nature of this method. Since CPP is NP-Hard, we introduce an efficient solution procedure, the xER algorithm, to solve CPP as a combi- nation of finding maximal cliques in the graph and then performing generalized set packing using a novel formulation. We discuss the advantages of using xER over the traditional methods and provide the computational exper- iments and results of applying this method to ER data sets.

2020

pdf bib
Exploring Semantic Capacity of Terms
Jie Huang | Zilong Wang | Kevin Chang | Wen-mei Hwu | JinJun Xiong
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We introduce and study semantic capacity of terms. For example, the semantic capacity of artificial intelligence is higher than that of linear regression since artificial intelligence possesses a broader meaning scope. Understanding semantic capacity of terms will help many downstream tasks in natural language processing. For this purpose, we propose a two-step model to investigate semantic capacity of terms, which takes a large text corpus as input and can evaluate semantic capacity of terms if the text corpus can provide enough co-occurrence information of terms. Extensive experiments in three fields demonstrate the effectiveness and rationality of our model compared with well-designed baselines and human-level evaluations.

2019

pdf bib
Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus
Hongyu Gong | Suma Bhat | Lingfei Wu | JinJun Xiong | Wen-mei Hwu
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Text style transfer rephrases a text from a source style (e.g., informal) to a target style (e.g., formal) while keeping its original meaning. Despite the success existing works have achieved using a parallel corpus for the two styles, transferring text style has proven significantly more challenging when there is no parallel training corpus. In this paper, we address this challenge by using a reinforcement-learning-based generator-evaluator architecture. Our generator employs an attention-based encoder-decoder to transfer a sentence from the source style to the target style. Our evaluator is an adversarially trained style discriminator with semantic and syntactic constraints that score the generated sentence for style, meaning preservation, and fluency. Experimental results on two different style transfer tasks–sentiment transfer, and formality transfer–show that our model outperforms state-of-the-art approaches.Furthermore, we perform a manual evaluation that demonstrates the effectiveness of the proposed method using subjective metrics of generated text quality.

pdf bib
PaRe: A Paper-Reviewer Matching Approach Using a Common Topic Space
Omer Anjum | Hongyu Gong | Suma Bhat | Wen-Mei Hwu | JinJun Xiong
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Finding the right reviewers to assess the quality of conference submissions is a time consuming process for conference organizers. Given the importance of this step, various automated reviewer-paper matching solutions have been proposed to alleviate the burden. Prior approaches including bag-of-words model and probabilistic topic model are less effective to deal with the vocabulary mismatch and partial topic overlap between the submission and reviewer. Our approach, the common topic model, jointly models the topics common to the submission and the reviewer’s profile while relying on abstract topic vectors. Experiments and insightful evaluations on two datasets demonstrate that the proposed method achieves consistent improvements compared to the state-of-the-art.