2022
pdf
abs
Section Classification in Clinical Notes with Multi-task Transformers
Fan Zhang
|
Itay Laish
|
Ayelet Benjamini
|
Amir Feder
Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI)
Clinical notes are the backbone of electronic health records, often containing vital information not observed in other structured data. Unfortunately, the unstructured nature of clinical notes can lead to critical patient-related information being lost. Algorithms that organize clinical notes into distinct sections are often proposed in order to allow medical professionals to better access information in a given note. These algorithms, however, often assume a given partition over the note, and classify section types given this information. In this paper, we propose a multi-task solution for note sectioning, where a single model identifies context changes and labels each section with its medically-relevant title. Results on in-distribution (MIMIC-III) and out-of-distribution (private held-out) datasets reveal that our approach successfully identifies note sections across different hospital systems.
pdf
abs
LUL’s WMT22 Automatic Post-Editing Shared Task Submission
Xiaoying Huang
|
Xingrui Lou
|
Fan Zhang
|
Tu Mei
Proceedings of the Seventh Conference on Machine Translation (WMT)
By learning the human post-edits, the automatic post-editing (APE) models are often used to modify the output of the machine translation (MT) system to make it as close as possible to human translation. We introduce the system used in our submission of WMT’22 Automatic Post-Editing (APE) English-Marathi (En-Mr) shared task. In this task, we first train the MT system of En-Mr to generate additional machine-translation sentences. Then we use the additional triple to bulid our APE model and use APE dataset to further fine-tuning. Inspired by the mixture of experts (MoE), we use GMM algorithm to roughly divide the text of APE dataset into three categories. After that, the experts are added to the APE model and different domain data are sent to different experts. Finally, we ensemble the models to get better performance. Our APE system significantly improves the translations of provided MT results by -2.848 and +3.74 on the development dataset in terms of TER and BLEU, respectively. Finally, the TER and BLEU scores are improved by -1.22 and +2.41 respectively on the blind test set.
2021
pdf
abs
软件标识符的自然语言规范性研究(Research on the Natural Language Normalness of Software Identifiers)
Dongzhen Wen (汶东震)
|
Fan Zhang (张帆)
|
Xiao Zhang (张晓)
|
Liang Yang (杨亮)
|
Yuan Lin (林原)
|
Bo Xu (徐博)
|
Hongfei Lin (林鸿飞)
Proceedings of the 20th Chinese National Conference on Computational Linguistics
软件源代码的理解则是软件协同开发与维护的核心,而源代码中占半数以上的标识符的理解则在软件理解中起到重要作用,传统软件工程主要研究通过命名规范限制标识符的命名过程以构造更易理解和交流的标识符。本文则在梳理分析常见编程语言命名规范的基础上,提出一种全新的标识符可理解性评价标准。具体而言,本文首先总结梳理了常见主流编程语言中的命名规范并类比自然语言语素概念本文提出基于软件语素的标识符构成过程,即标识符的构成可被视为软件语素的生成、排列和连接过程。在此基础上,本文提出一种结合自然语料库的软件标识符规范性评价方法,用来衡量软件标识符是否易于理解。最后,本文通过源代码理解数据集和乇乩乴乨乵乢平台中开源项目对规范性指标进行了验证性实验,结果表明本文提出的规范性分数能够很好衡量软件项目的可理解性。
pdf
abs
Improving Faithfulness in Abstractive Summarization with Contrast Candidate Generation and Selection
Sihao Chen
|
Fan Zhang
|
Kazoo Sone
|
Dan Roth
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Despite significant progress in neural abstractive summarization, recent studies have shown that the current models are prone to generating summaries that are unfaithful to the original context. To address the issue, we study contrast candidate generation and selection as a model-agnostic post-processing technique to correct the extrinsic hallucinations (i.e. information not present in the source text) in unfaithful summaries. We learn a discriminative correction model by generating alternative candidate summaries where named entities and quantities in the generated summary are replaced with ones with compatible semantic types from the source document. This model is then used to select the best candidate as the final output summary. Our experiments and analysis across a number of neural summarization systems show that our proposed method is effective in identifying and correcting extrinsic hallucinations. We analyze the typical hallucination phenomenon by different types of neural summarization systems, in hope to provide insights for future work on the direction.
pdf
abs
On Sample Based Explanation Methods for NLP: Faithfulness, Efficiency and Semantic Evaluation
Wei Zhang
|
Ziming Huang
|
Yada Zhu
|
Guangnan Ye
|
Xiaodong Cui
|
Fan Zhang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
In the recent advances of natural language processing, the scale of the state-of-the-art models and datasets is usually extensive, which challenges the application of sample-based explanation methods in many aspects, such as explanation interpretability, efficiency, and faithfulness. In this work, for the first time, we can improve the interpretability of explanations by allowing arbitrary text sequences as the explanation unit. On top of this, we implement a hessian-free method with a model faithfulness guarantee. Finally, to compare our method with the others, we propose a semantic-based evaluation metric that can better align with humans’ judgment of explanations than the widely adopted diagnostic or re-training measures. The empirical results on multiple real data sets demonstrate the proposed method’s superior performance to popular explanation techniques such as Influence Function or TracIn on semantic evaluation.
2017
pdf
abs
A Corpus of Annotated Revisions for Studying Argumentative Writing
Fan Zhang
|
Homa B. Hashemi
|
Rebecca Hwa
|
Diane Litman
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This paper presents ArgRewrite, a corpus of between-draft revisions of argumentative essays. Drafts are manually aligned at the sentence level, and the writer’s purpose for each revision is annotated with categories analogous to those used in argument mining and discourse analysis. The corpus should enable advanced research in writing comparison and revision analysis, as demonstrated via our own studies of student revision behavior and of automatic revision purpose prediction.
2016
pdf
Extracting PDTB Discourse Relations from Student Essays
Kate Forbes-Riley
|
Fan Zhang
|
Diane Litman
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue
pdf
Using Context to Predict the Purpose of Argumentative Writing Revisions
Fan Zhang
|
Diane Litman
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
pdf
ArgRewrite: A Web-based Revision Assistant for Argumentative Writings
Fan Zhang
|
Rebecca Hwa
|
Diane Litman
|
Homa B. Hashemi
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
pdf
abs
Inferring Discourse Relations from PDTB-style Discourse Labels for Argumentative Revision Classification
Fan Zhang
|
Diane Litman
|
Katherine Forbes Riley
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Penn Discourse Treebank (PDTB)-style annotation focuses on labeling local discourse relations between text spans and typically ignores larger discourse contexts. In this paper we propose two approaches to infer discourse relations in a paragraph-level context from annotated PDTB labels. We investigate the utility of inferring such discourse information using the task of revision classification. Experimental results demonstrate that the inferred information can significantly improve classification performance compared to baselines, not only when PDTB annotation comes from humans but also from automatic parsers.
2015
pdf
Peking: Building Semantic Dependency Graphs with a Hybrid Parser
Yantao Du
|
Fan Zhang
|
Xun Zhang
|
Weiwei Sun
|
Xiaojun Wan
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)
pdf
Annotation and Classification of an Email Importance Corpus
Fan Zhang
|
Kui Xu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
pdf
Annotation and Classification of Argumentative Writing Revisions
Fan Zhang
|
Diane Litman
Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications
2014
pdf
Peking: Profiling Syntactic Tree Parsing Techniques for Semantic Graph Parsing
Yantao Du
|
Fan Zhang
|
Weiwei Sun
|
Xiaojun Wan
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)
pdf
Sentence-level Rewriting Detection
Fan Zhang
|
Diane Litman
Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications
2013
pdf
bib
WordTopic-MultiRank: A New Method for Automatic Keyphrase Extraction
Fan Zhang
|
Lian’en Huang
|
Bo Peng
Proceedings of the Sixth International Joint Conference on Natural Language Processing
2012
pdf
SentTopic-MultiRank: a Novel Ranking Model for Multi-Document Summarization
Wenpeng Yin
|
Yulong Pei
|
Fan Zhang
|
Lian’en Huang
Proceedings of COLING 2012
2011
pdf
Nonlinear Evidence Fusion and Propagation for Hyponymy Relation Mining
Fan Zhang
|
Shuming Shi
|
Jing Liu
|
Shuqi Sun
|
Chin-Yew Lin
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies