Xuan Wang


2024

pdf
AI for Science in the Era of Large Language Models
Zhenyu Bi | Minghao Xu | Jian Tang | Xuan Wang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

The capabilities of AI in the realm of science span a wide spectrum, from the atomic level, where it solves partial differential equations for quantum systems, to the molecular level, predicting chemical or protein structures, and even extending to societal predictions like infectious disease outbreaks. Recent advancements in large language models (LLMs), exemplified by models like ChatGPT, have showcased significant prowess in tasks involving natural language, such as translating languages, constructing chatbots, and answering questions. When we consider scientific data, we notice a resemblance to natural language in terms of sequences – scientific literature and health records presented as text, bio-omics data arranged in sequences, or sensor data like brain signals. The question arises: Can we harness the potential of these recent LLMs to drive scientific progress? In this tutorial, we will explore the application of large language models to three crucial categories of scientific data: 1) textual data, 2) biomedical sequences, and 3) brain signals. Furthermore, we will delve into LLMs’ challenges in scientific research, including ensuring trustworthiness, achieving personalization, and adapting to multi-modal data representation.

pdf
PromptRE: Weakly-Supervised Document-Level Relation Extraction via Prompting-Based Data Programming
Chufan Gao | Xulin Fan | Jimeng Sun | Xuan Wang
Proceedings of the 1st Workshop on Towards Knowledgeable Language Models (KnowLLM 2024)

Relation extraction aims to classify the relationships between two entities into pre-defined categories. While previous research has mainly focused on sentence-level relation extraction, recent studies have expanded the scope to document-level relation extraction. Traditional relation extraction methods heavily rely on human-annotated training data, which is time-consuming and labor-intensive. To mitigate the need for manual annotation, recent weakly-supervised approaches have been developed for sentence-level relation extraction while limited work has been done on document-level relation extraction. Weakly-supervised document-level relation extraction faces significant challenges due to an imbalanced number “no relation” instances and the failure of directly probing pretrained large language models for document relation extraction. To address these challenges, we propose PromptRE, a novel weakly-supervised document-level relation extraction method that combines prompting-based techniques with data programming. Furthermore, PromptRE incorporates the label distribution and entity types as prior knowledge to improve the performance. By leveraging the strengths of both prompting and data programming, PromptRE achieves improved performance in relation classification and effectively handles the “no relation” problem. Experimental results on ReDocRED, a benchmark dataset for document-level relation extraction, demonstrate the superiority of PromptRE over baseline approaches.

pdf
TriageAgent: Towards Better Multi-Agents Collaborations for Large Language Model-Based Clinical Triage
Meng Lu | Brandon Ho | Dennis Ren | Xuan Wang
Findings of the Association for Computational Linguistics: EMNLP 2024

The global escalation in emergency department patient visits poses significant challenges to efficient clinical management, particularly in clinical triage. Traditionally managed by human professionals, clinical triage is susceptible to substantial variability and high workloads. Although large language models (LLMs) demonstrate promising reasoning and understanding capabilities, directly applying them to clinical triage remains challenging due to the complex and dynamic nature of the clinical triage task. To address these issues, we introduce TriageAgent, a novel heterogeneous multi-agent framework designed to enhance collaborative decision-making in clinical triage. TriageAgent leverages LLMs for role-playing, incorporating self-confidence and early-stopping mechanisms in multi-round discussions to improve document reasoning and classification precision for triage tasks. In addition, TriageAgent employs the medical Emergency Severity Index (ESI) handbook through a retrieval-augmented generation (RAG) approach to provide precise clinical knowledge and integrates both coarse- and fine-grained ESI-level predictions in the decision-making process. Extensive experiments demonstrate that TriageAgent outperforms state-of-the-art LLM-based methods on three clinical triage test sets. Furthermore, we have released the first public benchmark dataset for clinical triage with corresponding ESI levels and human expert performance for comparison.

pdf
On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization
Yong Lin | Skyler Seto | Maartje Ter Hoeve | Katherine Metcalf | Barry-John Theobald | Xuan Wang | Yizhe Zhang | Chen Huang | Tong Zhang
Findings of the Association for Computational Linguistics: EMNLP 2024

Reinforcement Learning from Human Feedback (RLHF) is an effective approach for aligning language models to human preferences. Central to RLHF is learning a reward function for scoring human preferences. Two main approaches for learning a reward model are 1) training an EXplicit Reward Model (EXRM) as in RLHF, and 2) using an implicit reward learned from preference data through methods such as Direct Preference Optimization (DPO). Prior work has shown that the implicit reward model of DPO (denoted as DPORM) can approximate an EXRM on the limit infinite samples. However, it is unclear how effective is DPORM in practice. DPORM’s effectiveness directly implies the optimality of learned policy of DPO and also has practical implication for more advanced alignment methods, such as iterative DPO. We compare the accuracy at distinguishing preferred and rejected answers using both DPORM and EXRM. Our findings indicate that even though DPORM can fit the training dataset, it generalizes less effective than EXRM, especially when the validation datasets contain distributional shifts. Across five out-of-distribution settings, DPORM has a mean drop in accuracy of 3% and a maximum drop of 7%. These findings highlight that DPORM has limited generalization ability and substantiates the integration of an explicit reward model in iterative DPO approaches.

pdf
TTM-RE: Memory-Augmented Document-Level Relation Extraction
Chufan Gao | Xuan Wang | Jimeng Sun
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Document-level relation extraction aims to categorize the association between any two entities within a document.We find that previous methods for document-level relation extraction are ineffective in exploiting the full potential of large amounts of training data with varied noise levels. For example, in the ReDocRED benchmark dataset, state-of-the-art methods trained on the large-scale, lower-quality, distantly supervised training data generally do not perform better than those trained solely on the smaller, high-quality, human-annotated training data. To unlock the full potential of large-scale noisy training data for document-level relation extraction, we propose TTM-RE, a novel approach that integrates a trainable memory module, known as the Token Turing Machine, with a noisy-robust loss function that accounts for the positive-unlabeled setting. The trainable memory module enhances knowledge extraction from the large-scale noisy training dataset through an explicit learning of the memory tokens and a soft integration of the learned memory tokens into the input representation, thereby improving the model’s effectiveness for the final relation classification. Extensive experiments on ReDocRED, a benchmark dataset for document-level relation extraction, reveal that TTM-RE achieves state-of-the-art performance (with an absolute F1 score improvement of over 3%). Ablation studies further illustrate the superiority of TTM-RE in other domains (the ChemDisGene dataset in the biomedical domain) and under highly unlabeled settings.

2023

pdf
Text Augmented Open Knowledge Graph Completion via Pre-Trained Language Models
Pengcheng Jiang | Shivam Agarwal | Bowen Jin | Xuan Wang | Jimeng Sun | Jiawei Han
Findings of the Association for Computational Linguistics: ACL 2023

The mission of open knowledge graph (KG) completion is to draw new findings from known facts. Existing works that augment KG completion require either (1) factual triples to enlarge the graph reasoning space or (2) manually designed prompts to extract knowledge from a pre-trained language model (PLM), exhibiting limited performance and requiring expensive efforts from experts. To this end, we propose TagReal that automatically generates quality query prompts and retrieves support information from large text corpora to probe knowledge from PLM for KG completion. The results show that TagReal achieves state-of-the-art performance on two benchmark datasets. We find that TagReal has superb performance even with limited training data, outperforming existing embedding-based, graph-based, and PLM-based methods.

pdf
ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision
Ming Zhong | Siru Ouyang | Minhao Jiang | Vivian Hu | Yizhu Jiao | Xuan Wang | Jiawei Han
Findings of the Association for Computational Linguistics: ACL 2023

Structured chemical reaction information plays a vital role for chemists engaged in laboratory work and advanced endeavors such as computer-aided drug design. Despite the importance of extracting structured reactions from scientific literature, data annotation for this purpose is cost-prohibitive due to the significant labor required from domain experts. Consequently, the scarcity of sufficient training data poses an obstacle to the progress of related models in this domain. In this paper, we propose ReactIE, which combines two weakly supervised approaches for pre-training. Our method utilizes frequent patterns within the text as linguistic cues to identify specific characteristics of chemical reactions. Additionally, we adopt synthetic data from patent records as distant supervision to incorporate domain knowledge into the model. Experiments demonstrate that ReactIE achieves substantial improvements and outperforms all existing baselines.

pdf
MEGClass: Extremely Weakly Supervised Text Classification via Mutually-Enhancing Text Granularities
Priyanka Kargupta | Tanay Komarlu | Susik Yoon | Xuan Wang | Jiawei Han
Findings of the Association for Computational Linguistics: EMNLP 2023

Text classification is essential for organizing unstructured text. Traditional methods rely on human annotations or, more recently, a set of class seed words for supervision, which can be costly, particularly for specialized or emerging domains. To address this, using class surface names alone as extremely weak supervision has been proposed. However, existing approaches treat different levels of text granularity (documents, sentences, or words) independently, disregarding inter-granularity class disagreements and the context identifiable exclusively through joint extraction. In order to tackle these issues, we introduce MEGClass, an extremely weakly-supervised text classification method that leverages Mutually-Enhancing Text Granularities. MEGClass utilizes coarse- and fine-grained context signals obtained by jointly considering a document’s most class-indicative words and sentences. This approach enables the learning of a contextualized document representation that captures the most discriminative class indicators. By preserving the heterogeneity of potential classes, MEGClass can select the most informative class-indicative documents as iterative feedback to enhance the initial word-based class representations and ultimately fine-tune a pre-trained text classifier. Extensive experiments on seven benchmark datasets demonstrate that MEGClass outperforms other weakly and extremely weakly supervised methods.

pdf
DRGCoder: Explainable Clinical Coding for the Early Prediction of Diagnostic-Related Groups
Daniel Hajialigol | Derek Kaknes | Tanner Barbour | Daphne Yao | Chris North | Jimeng Sun | David Liem | Xuan Wang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Medical claim coding is the process of transforming medical records, usually presented as free texts written by clinicians, or discharge summaries, into structured codes in a classification system such as ICD-10 (International Classification of Diseases, Tenth Revision) or DRG (Diagnosis-Related Group) codes. This process is essential for medical billing and transitional care; however, manual coding is time-consuming, error-prone, and expensive. To solve these issues, we propose DRGCoder, an explainability-enhanced clinical claim coding system for the early prediction of medical severity DRGs (MS-DRGs), a classification system that categorizes patients’ hospital stays into various DRG groups based on the severity of illness and mortality risk. The DRGCoder framework introduces a novel multi-task Transformer model for MS-DRG prediction, modeling both the DRG labels of the discharge summaries and the important, or salient words within he discharge summaries. We allow users to inspect DRGCoder’s reasoning by visualizing the weights for each word of the input. Additionally, DRGCoder allows users to identify diseases within discharge summaries and compare across multiple discharge summaries. Our demo is available at https://huggingface.co/spaces/danielhajialigol/DRGCoder. A video demonstrating the demo can be found at https://www.youtube.com/watch?v=pcdiG6VwqlA

pdf
Reaction Miner: An Integrated System for Chemical Reaction Extraction from Textual Data
Ming Zhong | Siru Ouyang | Yizhu Jiao | Priyanka Kargupta | Leo Luo | Yanzhen Shen | Bobby Zhou | Xianrui Zhong | Xuan Liu | Hongxiang Li | Jinfeng Xiao | Minhao Jiang | Vivian Hu | Xuan Wang | Heng Ji | Martin Burke | Huimin Zhao | Jiawei Han
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Chemical reactions, as a core entity in the realm of chemistry, hold crucial implications in diverse areas ranging from hands-on laboratory research to advanced computational drug design. Despite a burgeoning interest in employing NLP techniques to extract these reactions, aligning this task with the real-world requirements of chemistry practitioners remains an ongoing challenge. In this paper, we present Reaction Miner, a system specifically designed to interact with raw scientific literature, delivering precise and more informative chemical reactions. Going beyond mere extraction, Reaction Miner integrates a holistic workflow: it accepts PDF files as input, bypassing the need for pre-processing and bolstering user accessibility. Subsequently, a text segmentation module ensures that the refined text encapsulates complete chemical reactions, augmenting the accuracy of extraction. Moreover, Reaction Miner broadens the scope of existing pre-defined reaction roles, including vital attributes previously neglected, thereby offering a more comprehensive depiction of chemical reactions. Evaluations conducted by chemistry domain users highlight the efficacy of each module in our system, demonstrating Reaction Miner as a powerful tool in this field.

2022

pdf
Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds
Yu Zhang | Yu Meng | Xuan Wang | Sheng Wang | Jiawei Han
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Discovering latent topics from text corpora has been studied for decades. Many existing topic models adopt a fully unsupervised setting, and their discovered topics may not cater to users’ particular interests due to their inability of leveraging user guidance. Although there exist seed-guided topic discovery approaches that leverage user-provided seeds to discover topic-representative terms, they are less concerned with two factors: (1) the existence of out-of-vocabulary seeds and (2) the power of pre-trained language models (PLMs). In this paper, we generalize the task of seed-guided topic discovery to allow out-of-vocabulary seeds. We propose a novel framework, named SeeTopic, wherein the general knowledge of PLMs and the local semantics learned from the input corpus can mutually benefit each other. Experiments on three real datasets from different domains demonstrate the effectiveness of SeeTopic in terms of topic coherence, accuracy, and diversity.

2021

pdf
COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation
Qingyun Wang | Manling Li | Xuan Wang | Nikolaus Parulian | Guangxing Han | Jiawei Ma | Jingxuan Tu | Ying Lin | Ranran Haoran Zhang | Weili Liu | Aabhas Chauhan | Yingjun Guan | Bangzheng Li | Ruisong Li | Xiangchen Song | Yi Fung | Heng Ji | Jiawei Han | Shih-Fu Chang | James Pustejovsky | Jasmine Rah | David Liem | Ahmed ELsayed | Martha Palmer | Clare Voss | Cynthia Schneider | Boyan Onyshkevych
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations

To combat COVID-19, both clinicians and scientists need to digest the vast amount of relevant biomedical knowledge in literature to understand the disease mechanism and the related biological functions. We have developed a novel and comprehensive knowledge discovery framework, COVID-KG to extract fine-grained multimedia knowledge elements (entities, relations and events) from scientific literature. We then exploit the constructed multimedia knowledge graphs (KGs) for question answering and report generation, using drug repurposing as a case study. Our framework also provides detailed contextual sentences, subfigures, and knowledge subgraphs as evidence. All of the data, KGs, reports.

pdf
Noise Robust Named Entity Understanding for Voice Assistants
Deepak Muralidharan | Joel Ruben Antony Moniz | Sida Gao | Xiao Yang | Justine Kao | Stephen Pulman | Atish Kothari | Ray Shen | Yinying Pan | Vivek Kaul | Mubarak Seyed Ibrahim | Gang Xiang | Nan Dun | Yidan Zhou | Andy O | Yuan Zhang | Pooja Chitkara | Xuan Wang | Alkesh Patel | Kushal Tayal | Roger Zheng | Peter Grasch | Jason D Williams | Lin Li
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

Named Entity Recognition (NER) and Entity Linking (EL) play an essential role in voice assistant interaction, but are challenging due to the special difficulties associated with spoken user queries. In this paper, we propose a novel architecture that jointly solves the NER and EL tasks by combining them in a joint reranking module. We show that our proposed framework improves NER accuracy by up to 3.13% and EL accuracy by up to 3.6% in F1 score. The features used also lead to better accuracies in other natural language understanding tasks, such as domain classification and semantic parsing.

pdf
ChemNER: Fine-Grained Chemistry Named Entity Recognition with Ontology-Guided Distant Supervision
Xuan Wang | Vivian Hu | Xiangchen Song | Shweta Garg | Jinfeng Xiao | Jiawei Han
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Scientific literature analysis needs fine-grained named entity recognition (NER) to provide a wide range of information for scientific discovery. For example, chemistry research needs to study dozens to hundreds of distinct, fine-grained entity types, making consistent and accurate annotation difficult even for crowds of domain experts. On the other hand, domain-specific ontologies and knowledge bases (KBs) can be easily accessed, constructed, or integrated, which makes distant supervision realistic for fine-grained chemistry NER. In distant supervision, training labels are generated by matching mentions in a document with the concepts in the knowledge bases (KBs). However, this kind of KB-matching suffers from two major challenges: incomplete annotation and noisy annotation. We propose ChemNER, an ontology-guided, distantly-supervised method for fine-grained chemistry NER to tackle these challenges. It leverages the chemistry type ontology structure to generate distant labels with novel methods of flexible KB-matching and ontology-guided multi-type disambiguation. It significantly improves the distant label generation for the subsequent sequence labeling model training. We also provide an expert-labeled, chemistry NER dataset with 62 fine-grained chemistry types (e.g., chemical compounds and chemical reactions). Experimental results show that ChemNER is highly effective, outperforming substantially the state-of-the-art NER methods (with .25 absolute F1 score improvement).

pdf
Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training
Yu Meng | Yunyi Zhang | Jiaxin Huang | Xuan Wang | Yu Zhang | Heng Ji | Jiawei Han
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

We study the problem of training named entity recognition (NER) models using only distantly-labeled data, which can be automatically obtained by matching entity mentions in the raw text with entity types in a knowledge base. The biggest challenge of distantly-supervised NER is that the distant supervision may induce incomplete and noisy labels, rendering the straightforward application of supervised learning ineffective. In this paper, we propose (1) a noise-robust learning scheme comprised of a new loss function and a noisy label removal step, for training NER models on distantly-labeled data, and (2) a self-training method that uses contextualized augmentations created by pre-trained language models to improve the generalization ability of the NER model. On three benchmark datasets, our method achieves superior performance, outperforming existing distantly-supervised NER models by significant margins.

2020

pdf
EVIDENCEMINER: Textual Evidence Discovery for Life Sciences
Xuan Wang | Yingjun Guan | Weili Liu | Aabhas Chauhan | Enyi Jiang | Qi Li | David Liem | Dibakar Sigdel | John Caufield | Peipei Ping | Jiawei Han
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Traditional search engines for life sciences (e.g., PubMed) are designed for document retrieval and do not allow direct retrieval of specific statements. Some of these statements may serve as textual evidence that is key to tasks such as hypothesis generation and new finding validation. We present EVIDENCEMINER, a web-based system that lets users query a natural language statement and automatically retrieves textual evidence from a background corpora for life sciences. EVIDENCEMINER is constructed in a completely automated way without any human effort for training data annotation. It is supported by novel data-driven methods for distantly supervised named entity recognition and open information extraction. The entities and patterns are pre-computed and indexed offline to support fast online evidence retrieval. The annotation results are also highlighted in the original document for better visualization. EVIDENCEMINER also includes analytic functionalities such as the most frequent entity and relation summarization. EVIDENCEMINER can help scientists uncover important research issues, leading to more effective research and more in-depth quantitative analysis. The system of EVIDENCEMINER is available at https://evidenceminer.firebaseapp.com/.

2018

pdf
Variational Autoregressive Decoder for Neural Response Generation
Jiachen Du | Wenjie Li | Yulan He | Ruifeng Xu | Lidong Bing | Xuan Wang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Combining the virtues of probability graphic models and neural networks, Conditional Variational Auto-encoder (CVAE) has shown promising performance in applications such as response generation. However, existing CVAE-based models often generate responses from a single latent variable which may not be sufficient to model high variability in responses. To solve this problem, we propose a novel model that sequentially introduces a series of latent variables to condition the generation of each word in the response sequence. In addition, the approximate posteriors of these latent variables are augmented with a backward Recurrent Neural Network (RNN), which allows the latent variables to capture long-term dependencies of future tokens in generation. To facilitate training, we supplement our model with an auxiliary objective that predicts the subsequent bag of words. Empirical experiments conducted on Opensubtitle and Reddit datasets show that the proposed model leads to significant improvement on both relevance and diversity over state-of-the-art baselines.

2017

pdf
XJNLP at SemEval-2017 Task 12: Clinical temporal information ex-traction with a Hybrid Model
Yu Long | Zhijing Li | Xuan Wang | Chen Li
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

Temporality is crucial in understanding the course of clinical events from a patient’s electronic health recordsand temporal processing is becoming more and more important for improving access to content. SemEval 2017 Task 12 (Clinical TempEval) addressed this challenge using the THYME corpus, a corpus of clinical narratives annotated with a schema based on TimeML2 guidelines. We developed and evaluated approaches for: extraction of temporal expressions (TIMEX3) and EVENTs; EVENT attributes; document-time relations. Our approach is a hybrid model which is based on rule based methods, semi-supervised learning, and semantic features with addition of manually crafted rules.

pdf
Life-iNet: A Structured Network-Based Knowledge Exploration and Analytics System for Life Sciences
Xiang Ren | Jiaming Shen | Meng Qu | Xuan Wang | Zeqiu Wu | Qi Zhu | Meng Jiang | Fangbo Tao | Saurabh Sinha | David Liem | Peipei Ping | Richard Weinshilboum | Jiawei Han
Proceedings of ACL 2017, System Demonstrations

2015

pdf
Improving Distributed Representation of Word Sense via WordNet Gloss Composition and Context Clustering
Tao Chen | Ruifeng Xu | Yulan He | Xuan Wang
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2012

pdf
Simple Maximum Entropy Models for Multilingual Coreference Resolution
Xinxin Li | Xuan Wang | Xingwei Liao
Joint Conference on EMNLP and CoNLL - Shared Task

pdf
A Light Weight Stemmer for Urdu Language: A Scarce Resourced Language
Sajjad Ahmad Khan | Waqas Anwar | Usama Ijaz Bajwa | Xuan Wang
Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing

pdf
N-gram and Gazetteer List Based Named Entity Recognition for Urdu: A Scarce Resourced Language
Faryal Jahangir | Waqas Anwar | Usama Ijaz Bajwa | Xuan Wang
Proceedings of the 10th Workshop on Asian Language Resources

pdf
Zhijun Wu: Chinese Semantic Dependency Parsing with Third-Order Features
Zhijun Wu | Xuan Wang | Xinxin Li
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

pdf
Coreference Resolution with Loose Transitivity Constraints
Xinxin Li | Xuan Wang | Shuhan Qi
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

pdf
Diversifying Information Needs in Results of Question Retrieval
Yaoyun Zhang | Xiaolong Wang | Xuan Wang | Ruifeng Xu | Jun Xu | ShiXi Fan
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
A Cascade Method for Detecting Hedges and their Scope in Natural Language Text
Buzhou Tang | Xiaolong Wang | Xuan Wang | Bo Yuan | Shixi Fan
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task

pdf
Exploiting Rich Features for Detecting Hedges and their Scope
Xinxin Li | Jianping Shen | Xiang Gao | Xuan Wang
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task

pdf
Chinese Word Segmentation based on Mixing Multiple Preprocessor and CRF
Jianping Shen | Xuan Wang | Hainan Zhao | Wenxiao Zhang
CIPS-SIGHAN Joint Conference on Chinese Language Processing

2009

pdf
A Joint Syntactic and Semantic Dependency Parsing System based on Maximum Entropy Models
Buzhou Tang | Lu Li | Xinxin Li | Xuan Wang | Xiaolong Wang
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task

2008

pdf bib
Semantic Chunk Annotation for complex questions using Conditional Random Field
Shixi Fan | Yaoyun Zhang | Wing W. Y. Ng | Xuan Wang | Xiaolong Wang
Coling 2008: Proceedings of the workshop on Knowledge and Reasoning for Answering Questions

pdf
Discriminative Learning of Syntactic and Semantic Dependencies
Lu Li | Shixi Fan | Xuan Wang | Xiaolong Wang
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

pdf
Chunking with Max-Margin Markov Networks
Buzhou Tang | Xuan Wang | Xiaolong Wang
Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation

Search
Co-authors