Wen Wang

Also published as: W. Wang


2022

pdf
MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction
Linhan Zhang | Qian Chen | Wen Wang | Chong Deng | ShiLiang Zhang | Bing Li | Wei Wang | Xin Cao
Findings of the Association for Computational Linguistics: ACL 2022

Keyphrase extraction (KPE) automatically extracts phrases in a document that provide a concise summary of the core content, which benefits downstream information retrieval and NLP tasks. Previous state-of-the-art methods select candidate keyphrases based on the similarity between learned representations of the candidates and the document. They suffer performance degradation on long documents due to discrepancy between sequence lengths which causes mismatch between representations of keyphrase candidates and the document. In this work, we propose a novel unsupervised embedding-based KPE approach, Masked Document Embedding Rank (MDERank), to address this problem by leveraging a mask strategy and ranking candidates by the similarity between embeddings of the source document and the masked document. We further develop a KPE-oriented BERT (KPEBERT) model by proposing a novel self-supervised contrastive learning method, which is more compatible to MDERank than vanilla BERT. Comprehensive evaluations on six KPE benchmarks demonstrate that the proposed MDERank outperforms state-of-the-art unsupervised KPE approach by average 1.80 F1@15 improvement. MDERank further benefits from KPEBERT and overall achieves average 3.53 F1@15 improvement over SIFRank.

2021

pdf
Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition
Yongliang Shen | Xinyin Ma | Zeqi Tan | Shuai Zhang | Wen Wang | Weiming Lu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Named entity recognition (NER) is a well-studied task in natural language processing. Traditional NER research only deals with flat entities and ignores nested entities. The span-based methods treat entity recognition as a span classification task. Although these methods have the innate ability to handle nested NER, they suffer from high computational cost, ignorance of boundary information, under-utilization of the spans that partially match with entities, and difficulties in long entity recognition. To tackle these issues, we propose a two-stage entity identifier. First we generate span proposals by filtering and boundary regression on the seed spans to locate the entities, and then label the boundary-adjusted span proposals with the corresponding categories. Our method effectively utilizes the boundary information of entities and partially matched spans during training. Through boundary regression, entities of any length can be covered theoretically, which improves the ability to recognize long entities. In addition, many low-quality seed spans are filtered out in the first stage, which reduces the time complexity of inference. Experiments on nested NER datasets demonstrate that our proposed method outperforms previous state-of-the-art models.

2015

pdf
Morphological Modeling for Machine Translation of English-Iraqi Arabic Spoken Dialogs
Katrin Kirchhoff | Yik-Cheung Tam | Colleen Richey | Wen Wang
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2013

pdf
A Cross-language Study on Automatic Speech Disfluency Detection
Wen Wang | Andreas Stolcke | Jiahong Yuan | Mark Liberman
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Name-aware Machine Translation
Haibo Li | Jing Zheng | Heng Ji | Qi Li | Wen Wang
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf
N-Best Rescoring Based on Pitch-accent Patterns
Je Hun Jeon | Wen Wang | Yang Liu
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf
Detection of Agreement and Disagreement in Broadcast Conversations
Wen Wang | Sibel Yaman | Kristin Precoda | Colleen Richey | Geoffrey Raymond
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2009

pdf
Anchored Speech Recognition for Question Answering
Sibel Yaman | Gokhan Tur | Dimitra Vergyri | Dilek Hakkani-Tur | Mary Harper | Wen Wang
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

2008

pdf
Improving Alignments for Better Confusion Networks for Combining Machine Translation Systems
Necip Fazil Ayan | Jing Zheng | Wen Wang
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf
Mandarin Part-of-Speech Tagging and Discriminative Reranking
Zhongqiang Huang | Mary Harper | Wen Wang
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2004

pdf
A Statistical Constraint Dependency Grammar (CDG) Parser
Wen Wang | Mary P. Harper
Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together

2002

pdf
The SuperARV Language Model: Investigating the Effectiveness of Tightly Integrating Multiple Knowledge Sources
Wen Wang | Mary P. Harper
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

2000

pdf
A Question Answering System Developed as a Project in a Natural Language Processing Course
W. Wang | J. Auer | R. Parasuraman | I. Zubarev | D. Brandyberry | M. P. Harper
ANLP-NAACL 2000 Workshop: Reading Comprehension Tests as Evaluation for Computer-Based Language Understanding Systems

pdf
The Effectiveness of Corpus-Induced Dependency Grammars for Post-processing Speech
M. P. Harper | C. M. White | W. Wang | M. T. Johnson | R. A. Helzerman
1st Meeting of the North American Chapter of the Association for Computational Linguistics