Jian Wu


2022

pdf
DialMed: A Dataset for Dialogue-based Medication Recommendation
Zhenfeng He | Yuqiang Han | Zhenqiu Ouyang | Wei Gao | Hongxu Chen | Guandong Xu | Jian Wu
Proceedings of the 29th International Conference on Computational Linguistics

Medication recommendation is a crucial task for intelligent healthcare systems. Previous studies mainly recommend medications with electronic health records (EHRs). However, some details of interactions between doctors and patients may be ignored or omitted in EHRs, which are essential for automatic medication recommendation. Therefore, we make the first attempt to recommend medications with the conversations between doctors and patients. In this work, we construct DIALMED, the first high-quality dataset for medical dialogue-based medication recommendation task. It contains 11, 996 medical dialogues related to 16 common diseases from 3 departments and 70 corresponding common medications. Furthermore, we propose a Dialogue structure and Disease knowledge aware Network (DDN), where a QA Dialogue Graph mechanism is designed to model the dialogue structure and the knowledge graph is used to introduce external disease knowledge. The extensive experimental results demonstrate that the proposed method is a promising solution to recommend medications with medical dialogues. The dataset and code are available at https://github.com/f-window/DialMed.

2021

pdf
Extractive Research Slide Generation Using Windowed Labeling Ranking
Athar Sefid | Prasenjit Mitra | Jian Wu | C Lee Giles
Proceedings of the Second Workshop on Scholarly Document Processing

Presentation slides generated from original research papers provide an efficient form to present research innovations. Manually generating presentation slides is labor-intensive. We propose a method to automatically generates slides for scientific articles based on a corpus of 5000 paper-slide pairs compiled from conference proceedings websites. The sentence labeling module of our method is based on SummaRuNNer, a neural sequence model for extractive summarization. Instead of ranking sentences based on semantic similarities in the whole document, our algorithm measures the importance and novelty of sentences by combining semantic and lexical features within a sentence window. Our method outperforms several baseline methods including SummaRuNNer by a significant margin in terms of ROUGE score.

2020

pdf
Acknowledgement Entity Recognition in CORD-19 Papers
Jian Wu | Pei Wang | Xin Wei | Sarah Rajtmajer | C. Lee Giles | Christopher Griffin
Proceedings of the First Workshop on Scholarly Document Processing

Acknowledgements are ubiquitous in scholarly papers. Existing acknowledgement entity recognition methods assume all named entities are acknowledged. Here, we examine the nuances between acknowledged and named entities by analyzing sentence structure. We develop an acknowledgement extraction system, AckExtract based on open-source text mining software and evaluate our method using manually labeled data. AckExtract uses the PDF of a scholarly paper as input and outputs acknowledgement entities. Results show an overall performance of F_1=0.92. We built a supplementary database by linking CORD-19 papers with acknowledgement entities extracted by AckExtract including persons and organizations and find that only up to 50–60% of named entities are actually acknowledged. We further analyze chronological trends of acknowledgement entities in CORD-19 papers. All codes and labeled data are publicly available at https://github.com/lamps-lab/ackextract.

pdf
SmartCiteCon: Implicit Citation Context Extraction from Academic Literature Using Supervised Learning
Chenrui Guo | Haoran Cui | Li Zhang | Jiamin Wang | Wei Lu | Jian Wu
Proceedings of the 8th International Workshop on Mining Scientific Publications

We introduce SmartCiteCon (SCC), a Java API for extracting both explicit and implicit citation context from academic literature in English. The tool is built on a Support Vector Machine (SVM) model trained on a set of 7,058 manually annotated citation context sentences, curated from 34,000 papers from the ACL Anthology. The model with 19 features achieves F1=85.6%. SCC supports PDF, XML, and JSON files out-of-box, provided that they are conformed to certain schemas. The API supports single document processing and batch processing in parallel. It takes about 12–45 seconds on average depending on the format to process a document on a dedicated server with 6 multithreaded cores. Using SCC, we extracted 11.8 million citation context sentences from ~33.3k PMC papers in the CORD-19 dataset, released on June 13, 2020. We will provide continuous supplementary data contribution to the CORD-19 and other datasets. The source code is released at https://gitee.com/irlab/SmartCiteCon.

2015

pdf
Tibetan Unknown Word Identification from News Corpora for Supporting Lexicon-based Tibetan Word Segmentation
Minghua Nuo | Huidan Liu | Congjun Long | Jian Wu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf
Zipf’s Law and Statistical Data on Modern Tibetan
Huidan Liu | Minghua Nuo | Jian Wu
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2012

pdf
Building Large Scale Text Corpus for Tibetan Natural Language Processing by Extracting Text from Web Pages
Huidan Liu | Minghua Nuo | Jian Wu | Yeping He
Proceedings of the 10th Workshop on Asian Language Resources

pdf
Tibetan Base Noun Phrase Identification Framework Based on Chinese-Tibetan Sentence Aligned Corpus
Ming Hua Nuo | Hui Dan Liu | Wei Na Zhao | Long Long Ma | Jian Wu | Zhi Ming Ding
Proceedings of COLING 2012

2011

pdf
Compression Methods by Code Mapping and Code Dividing for Chinese Dictionary Stored in a Double-Array Trie
Huidan Liu | Minghua Nuo | Longlong Ma | Jian Wu | Yeping He
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf
Tibetan Word Segmentation as Syllable Tagging Using Conditional Random Field
Huidan Liu | Minghua Nuo | Longlong Ma | Jian Wu | Yeping He
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

2010

pdf
Tibetan Number Identification Based on Classification of Number Components in Tibetan Word Segmentation
Huidan Liu | Weina Zhao | Minghua Nuo | Li Jiang | Jian Wu | Yeping He
Coling 2010: Posters