Jianing Zhou


2022

pdf
Automatic Patient Note Assessment without Strong Supervision
Jianing Zhou | Vyom Nayan Thakkar | Rachel Yudkowsky | Suma Bhat | William F. Bond
Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI)

Training of physicians requires significant practice writing patient notes that document the patient’s medical and health information and physician diagnostic reasoning. Assessment and feedback of the patient note requires experienced faculty, consumes significant amounts of time and delays feedback to learners. Grading patient notes is thus a tedious and expensive process for humans that could be improved with the addition of natural language processing. However, the large manual effort required to create labeled datasets increases the challenge, particularly when test cases change. Therefore, traditional supervised NLP methods relying on labelled datasets are impractical in such a low-resource scenario. In our work, we proposed an unsupervised framework as a simple baseline and a weakly supervised method utilizing transfer learning for automatic assessment of patient notes under a low-resource scenario. Experiments on our self-collected datasets show that our weakly-supervised methods could provide reliable assessment for patient notes with accuracy of 0.92.

2021

pdf
Paraphrase Generation: A Survey of the State of the Art
Jianing Zhou | Suma Bhat
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

This paper focuses on paraphrase generation,which is a widely studied natural language generation task in NLP. With the development of neural models, paraphrase generation research has exhibited a gradual shift to neural methods in the recent years. This has provided architectures for contextualized representation of an input text and generating fluent, diverseand human-like paraphrases. This paper surveys various approaches to paraphrase generation with a main focus on neural methods.

pdf
PIE: A Parallel Idiomatic Expression Corpus for Idiomatic Sentence Generation and Paraphrasing
Jianing Zhou | Hongyu Gong | Suma Bhat
Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021)

Idiomatic expressions (IE) play an important role in natural language, and have long been a “pain in the neck” for NLP systems. Despite this, text generation tasks related to IEs remain largely under-explored. In this paper, we propose two new tasks of idiomatic sentence generation and paraphrasing to fill this research gap. We introduce a curated dataset of 823 IEs, and a parallel corpus with sentences containing them and the same sentences where the IEs were replaced by their literal paraphrases as the primary resource for our tasks. We benchmark existing deep learning models, which have state-of-the-art performance on related tasks using automated and manual evaluation with our dataset to inspire further research on our proposed tasks. By establishing baseline models, we pave the way for more comprehensive and accurate modeling of IEs, both for generation and paraphrasing.

2019

pdf
Multiple Character Embeddings for Chinese Word Segmentation
Jianing Zhou | Jingkang Wang | Gongshen Liu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Chinese word segmentation (CWS) is often regarded as a character-based sequence labeling task in most current works which have achieved great success with the help of powerful neural networks. However, these works neglect an important clue: Chinese characters incorporate both semantic and phonetic meanings. In this paper, we introduce multiple character embeddings including Pinyin Romanization and Wubi Input, both of which are easily accessible and effective in depicting semantics of characters. We propose a novel shared Bi-LSTM-CRF model to fuse linguistic features efficiently by sharing the LSTM network during the training procedure. Extensive experiments on five corpora show that extra embeddings help obtain a significant improvement in labeling accuracy. Specifically, we achieve the state-of-the-art performance in AS and CityU corpora with F1 scores of 96.9 and 97.3, respectively without leveraging any external lexical resources.