Fan Zhang


2021

pdf bib
软件标识符的自然语言规范性研究(Research on the Natural Language Normalness of Software Identifiers)
Dongzhen Wen (汶东震) | Fan Zhang (张帆) | Xiao Zhang (张晓) | Liang Yang (杨亮) | Yuan Lin (林原) | Bo Xu (徐博) | Hongfei Lin (林鸿飞)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

“软件源代码的理解则是软件协同开发与维护的核心,而源代码中占半数以上的标识符的理解则在软件理解中起到重要作用,传统软件工程主要研究通过命名规范限制标识符的命名过程以构造更易理解和交流的标识符。本文则在梳理分析常见编程语言命名规范的基础上,提出一种全新的标识符可理解性评价标准。具体而言,本文首先总结梳理了常见主流编程语言中的命名规范并类比自然语言语素概念本文提出基于软件语素的标识符构成过程,即标识符的构成可被视为软件语素的生成、排列和连接过程。在此基础上,本文提出一种结合自然语料库的软件标识符规范性评价方法,用来衡量软件标识符是否易于理解。最后,本文通过源代码理解数据集和乇乩乴乨乵乢平台中开源项目对规范性指标进行了验证性实验,结果表明本文提出的规范性分数能够很好衡量软件项目的可理解性。”

pdf bib
Improving Faithfulness in Abstractive Summarization with Contrast Candidate Generation and Selection
Sihao Chen | Fan Zhang | Kazoo Sone | Dan Roth
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Despite significant progress in neural abstractive summarization, recent studies have shown that the current models are prone to generating summaries that are unfaithful to the original context. To address the issue, we study contrast candidate generation and selection as a model-agnostic post-processing technique to correct the extrinsic hallucinations (i.e. information not present in the source text) in unfaithful summaries. We learn a discriminative correction model by generating alternative candidate summaries where named entities and quantities in the generated summary are replaced with ones with compatible semantic types from the source document. This model is then used to select the best candidate as the final output summary. Our experiments and analysis across a number of neural summarization systems show that our proposed method is effective in identifying and correcting extrinsic hallucinations. We analyze the typical hallucination phenomenon by different types of neural summarization systems, in hope to provide insights for future work on the direction.

pdf bib
On Sample Based Explanation Methods for NLP: Faithfulness, Efficiency and Semantic Evaluation
Wei Zhang | Ziming Huang | Yada Zhu | Guangnan Ye | Xiaodong Cui | Fan Zhang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In the recent advances of natural language processing, the scale of the state-of-the-art models and datasets is usually extensive, which challenges the application of sample-based explanation methods in many aspects, such as explanation interpretability, efficiency, and faithfulness. In this work, for the first time, we can improve the interpretability of explanations by allowing arbitrary text sequences as the explanation unit. On top of this, we implement a hessian-free method with a model faithfulness guarantee. Finally, to compare our method with the others, we propose a semantic-based evaluation metric that can better align with humans’ judgment of explanations than the widely adopted diagnostic or re-training measures. The empirical results on multiple real data sets demonstrate the proposed method’s superior performance to popular explanation techniques such as Influence Function or TracIn on semantic evaluation.

2017

pdf bib
A Corpus of Annotated Revisions for Studying Argumentative Writing
Fan Zhang | Homa B. Hashemi | Rebecca Hwa | Diane Litman
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper presents ArgRewrite, a corpus of between-draft revisions of argumentative essays. Drafts are manually aligned at the sentence level, and the writer’s purpose for each revision is annotated with categories analogous to those used in argument mining and discourse analysis. The corpus should enable advanced research in writing comparison and revision analysis, as demonstrated via our own studies of student revision behavior and of automatic revision purpose prediction.

2016

pdf bib
Using Context to Predict the Purpose of Argumentative Writing Revisions
Fan Zhang | Diane Litman
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
ArgRewrite: A Web-based Revision Assistant for Argumentative Writings
Fan Zhang | Rebecca Hwa | Diane Litman | Homa B. Hashemi
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf bib
Extracting PDTB Discourse Relations from Student Essays
Kate Forbes-Riley | Fan Zhang | Diane Litman
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Inferring Discourse Relations from PDTB-style Discourse Labels for Argumentative Revision Classification
Fan Zhang | Diane Litman | Katherine Forbes Riley
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Penn Discourse Treebank (PDTB)-style annotation focuses on labeling local discourse relations between text spans and typically ignores larger discourse contexts. In this paper we propose two approaches to infer discourse relations in a paragraph-level context from annotated PDTB labels. We investigate the utility of inferring such discourse information using the task of revision classification. Experimental results demonstrate that the inferred information can significantly improve classification performance compared to baselines, not only when PDTB annotation comes from humans but also from automatic parsers.

2015

pdf bib
Annotation and Classification of Argumentative Writing Revisions
Fan Zhang | Diane Litman
Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Annotation and Classification of an Email Importance Corpus
Fan Zhang | Kui Xu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Peking: Building Semantic Dependency Graphs with a Hybrid Parser
Yantao Du | Fan Zhang | Xun Zhang | Weiwei Sun | Xiaojun Wan
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
Sentence-level Rewriting Detection
Fan Zhang | Diane Litman
Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Peking: Profiling Syntactic Tree Parsing Techniques for Semantic Graph Parsing
Yantao Du | Fan Zhang | Weiwei Sun | Xiaojun Wan
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2013

pdf bib
WordTopic-MultiRank: A New Method for Automatic Keyphrase Extraction
Fan Zhang | Lian’en Huang | Bo Peng
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
SentTopic-MultiRank: a Novel Ranking Model for Multi-Document Summarization
Wenpeng Yin | Yulong Pei | Fan Zhang | Lian’en Huang
Proceedings of COLING 2012

2011

pdf bib
Nonlinear Evidence Fusion and Propagation for Hyponymy Relation Mining
Fan Zhang | Shuming Shi | Jing Liu | Shuqi Sun | Chin-Yew Lin
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies