Zhen Wang


2021

pdf bib
MedAI at SemEval-2021 Task 5: Start-to-end Tagging Framework for Toxic Spans Detection
Zhen Wang | Hongjie Fan | Junfei Liu
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper describes the system submitted to SemEval 2021 Task 5: Toxic Spans Detection. The task concerns evaluating systems that detect the spans that make a text toxic when detecting such spans are possible. To address the possibly multi-span detection problem, we develop a start-to-end tagging framework on top of RoBERTa based language model. Besides, we design a custom loss function that takes distance into account. In comparison to other participating teams, our system has achieved 69.03% F1 score, which is slightly lower (-1.8 and -1.73) than the top 1(70.83%) and top 2 (70.77%), respectively.

2020

pdf bib
Diversify Question Generation with Continuous Content Selectors and Question Type Modeling
Zhen Wang | Siwei Rao | Jie Zhang | Zhen Qin | Guangjian Tian | Jun Wang
Findings of the Association for Computational Linguistics: EMNLP 2020

Generating questions based on answers and relevant contexts is a challenging task. Recent work mainly pays attention to the quality of a single generated question. However, question generation is actually a one-to-many problem, as it is possible to raise questions with different focuses on contexts and various means of expression. In this paper, we explore the diversity of question generation and come up with methods from these two aspects. Specifically, we relate contextual focuses with content selectors, which are modeled by a continuous latent variable with the technique of conditional variational auto-encoder (CVAE). In the realization of CVAE, a multimodal prior distribution is adopted to allow for more diverse content selectors. To take into account various means of expression, question types are explicitly modeled and a diversity-promoting algorithm is proposed further. Experimental results on public datasets show that our proposed method can significantly improve the diversity of generated questions, especially from the perspective of using different question types. Overall, our proposed method achieves a better trade-off between generation quality and diversity compared with existing approaches.

pdf bib
Rationalizing Medical Relation Prediction from Corpus-level Statistics
Zhen Wang | Jennifer Lee | Simon Lin | Huan Sun
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Nowadays, the interpretability of machine learning models is becoming increasingly important, especially in the medical domain. Aiming to shed some light on how to rationalize medical relation prediction, we present a new interpretable framework inspired by existing theories on how human memory works, e.g., theories of recall and recognition. Given the corpus-level statistics, i.e., a global co-occurrence graph of a clinical text corpus, to predict the relations between two entities, we first recall rich contexts associated with the target entities, and then recognize relational interactions between these contexts to form model rationales, which will contribute to the final prediction. We conduct experiments on a real-world public clinical dataset and show that our framework can not only achieve competitive predictive performance against a comprehensive list of neural baseline models, but also present rationales to justify its prediction. We further collaborate with medical experts deeply to verify the usefulness of our model rationales for clinical decision making.

2018

pdf bib
Joint Training of Candidate Extraction and Answer Selection for Reading Comprehension
Zhen Wang | Jiachen Liu | Xinyan Xiao | Yajuan Lyu | Tian Wu
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While sophisticated neural-based techniques have been developed in reading comprehension, most approaches model the answer in an independent manner, ignoring its relations with other answer candidates. This problem can be even worse in open-domain scenarios, where candidates from multiple passages should be combined to answer a single question. In this paper, we formulate reading comprehension as an extract-then-select two-stage procedure. We first extract answer candidates from passages, then select the final answer by combining information from all the candidates. Furthermore, we regard candidate extraction as a latent variable and train the two-stage process jointly with reinforcement learning. As a result, our approach has improved the state-of-the-art performance significantly on two challenging open-domain reading comprehension datasets. Further analysis demonstrates the effectiveness of our model components, especially the information fusion of all the candidates and the joint training of the extract-then-select procedure.

2015

pdf bib
Aligning Knowledge and Text Embeddings by Entity Descriptions
Huaping Zhong | Jianwen Zhang | Zhen Wang | Hai Wan | Zheng Chen
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Chinese Semantic Role Labeling with Bidirectional Recurrent Neural Networks
Zhen Wang | Tingsong Jiang | Baobao Chang | Zhifang Sui
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
A mixed approach for Chinese word segmentation
Zhen Wang
Proceedings of The Third CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Extraction system for Personal Attributes Extraction of CLP2014
Zhen Wang
Proceedings of The Third CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Knowledge Graph and Text Jointly Embedding
Zhen Wang | Jianwen Zhang | Jianlin Feng | Zheng Chen
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
A Mixed Morpho-Syntactic and Statistical Approach to Chinese Named Entity Recognition (Une approche mixte morpho-syntaxique et statistique pour la reconnaissance d’entités nommées en langue chinoise) [in French]
Zhen Wang
Proceedings of RECITAL 2013

2004

pdf bib
Aligning Bilingual Corpora Using Sentences Location Information
Weigang Li | Ting Liu | Zhen Wang | Sheng Li
Proceedings of the Third SIGHAN Workshop on Chinese Language Processing