Fangping Lan
2026
Making Revisions Understandable: A Survey of Edit Intentions, Methods, and Applications
Fangping Lan | Qi Zhang | Eduard Dragut
Findings of the Association for Computational Linguistics: ACL 2026
Fangping Lan | Qi Zhang | Eduard Dragut
Findings of the Association for Computational Linguistics: ACL 2026
Text revision is a core process in document creation, capturing how authors iteratively refine, reorganize, and improve written content. With the increasing availability of large-scale revision histories from platforms such as Wikipedia and arXiv, NLP research has begun to move beyond modeling what changes are made to understanding why they are made, i.e., the underlying edit intentions. To our knowledge, this is the first survey that synthesizes text revision research through the lens of edit intentions, providing a unified view of datasets, taxonomies, identification methods, and applications. We review prior work across the full revision workflow, including revision corpus construction, edit intention taxonomy design, and edit intention identification. We further categorize representative datasets and methods, summarize downstream applications such as writing assistance and document edit summarization, and highlight key open research directions.
Scaling Performance and Low-Resource Annotation with Many-Shot In-Context Learning for Named Entity Recognition
Qi Zhang | Fangping Lan | Cornelia Caragea | Longin Jan Latecki | Eduard Dragut
Findings of the Association for Computational Linguistics: ACL 2026
Qi Zhang | Fangping Lan | Cornelia Caragea | Longin Jan Latecki | Eduard Dragut
Findings of the Association for Computational Linguistics: ACL 2026
In-context learning (ICL) with large language models (LLMs) has emerged as a powerful alternative to fine-tuning for Named Entity Recognition (NER), achieving strong performance with minimal annotation and no additional training. However, prior work has shown that despite their adaptability, LLMs still lag behind fully supervised models such as fine-tuned BERT in structured tasks like NER. While existing studies on ICL for NER have mainly explored few-shot settings, the potential of scaling to hundreds of demonstrations has not been thoroughly investigated. To address this gap, we conduct a comprehensive investigation of many-shot ICL for NER and further explore its effectiveness in annotating and refining data for low-resource NER tasks. Specifically, we evaluate various LLMs across multiple domains using hundreds of ICL examples and then assess the feasibility of using many-shot ICL as a data annotation framework. Our experiments demonstrate that: (1) scaling to hundreds of in-context examples enables LLMs to match or even surpass the performance of fully supervised BERT models; and (2) using about one hundred human-labeled examples as demonstrations, many-shot in-context annotation can generate high-quality labeled data, leading to approximately 10% absolute F1 improvement over existing state-of-the-art approaches when used to fine-tune BERT on low-resource NER.
2025
UniT: One Document, Many Revisions, Too Many Edit Intention Taxonomies
Fangping Lan | Abdullah Aljebreen | Eduard Dragut
Findings of the Association for Computational Linguistics: ACL 2025
Fangping Lan | Abdullah Aljebreen | Eduard Dragut
Findings of the Association for Computational Linguistics: ACL 2025
Writing is inherently iterative, each revision enhancing information representation. One revision may contain many edits. Examination of the intentions behind edits provides valuable insights into an editor’s expertise, the dynamics of collaborative writing, and the evolution of a document. Current research on edit intentions lacks a comprehensive edit intention taxonomy (EIT) that spans multiple application domains. As a result, researchers often create new EITs tailored to specific needs, a process that is both time-consuming and costly. To address this gap, we propose UniT, a Unified edit intention Taxonomy that integrates existing EITs encompassing a wide range of edit intentions. We examine the lineage relationship and the construction of 24 EITs. They together have 232 categories across various domains. During the literature survey and integration process, we identify challenges such as one-to-many category matches, incomplete definitions, and varying hierarchical structures. We propose solutions for resolving these issues. Finally, our evaluation shows that our UniT achieves higher inter-annotator agreement scores compared to existing EITs and is applicable to a large set of application domains.