Yiheng Wu
2026
EpiGator: An Event-based Surveillance System for Infectious Disease Outbreaks
Yiheng Wu | Jue Hou | Trangcasanchai Sathianpong | Lidia Pivovarova | Roman Yangarber
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Yiheng Wu | Jue Hou | Trangcasanchai Sathianpong | Lidia Pivovarova | Roman Yangarber
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We present EpiGator, a novel event-based system for global surveillance of outbreaks of infectious epidemics that automatically processes streams of news articles and generates reports about the outbreaks, which is crucial for medical authorities. The goal of our work is to combine our experience in outbreak surveillance with state-of-the-art large language models (LLM), which allows us to reduce the overall cost of system development and maintenance. The EpiGator pipeline combines keyword filtering, relevance classification, event-based clustering, and multi-document summarization. A key novelty lies in using a fine-tuned LLM to identify articles relevant to ongoing outbreaks, followed by a zero-shot information extraction pipeline that normalizes the event features and clusters the related articles. For each cluster, we generate an outbreak summary using instruction-tuned LLMs. We evaluate EpiGator output against disease outbreak reports written by medical specialists.
2025
Know-AI at TSAR 2025 Shared Task Difficulty-aware Text Simplification System
Yiheng Wu | Anisia Katinskaia | Jue Hou | Roman Yangarber
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)
Yiheng Wu | Anisia Katinskaia | Jue Hou | Roman Yangarber
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)
Text simplification is an active research topic with applications in multiple domains. In a simplification pipeline, assessment of text difficulty plays a crucial role as a quality control mechanism it acts as a critic and guides models to generate text at the difficulty level required by the user. This paper presents our Difficulty-aware Text Simplification System. We evaluate our pipeline using the TSAR shared task dataset and discuss challenges in constructing corpora for training models to assess text difficulty.
Estimation of Text Difficulty in the Context of Language Learning
Anisia Katinskaia | Anh-Duc Vu | Jue Hou | Ulla Vanhatalo | Yiheng Wu | Roman Yangarber
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Anisia Katinskaia | Anh-Duc Vu | Jue Hou | Ulla Vanhatalo | Yiheng Wu | Roman Yangarber
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Easy language and text simplification are currently topical research questions, with important applications in many contexts, and with various approaches under active investigation, including prompt-based methods. The estimation of the level of difficulty of a text becomes a crucial challenge when the estimator is employed in a simplification workflow as a quality-control mechanism. It can act as a critic in frameworks where it can guide other models, which are responsible for generating text at a specified level of difficulty, as determined by the user’s needs.We present our work in the context of simplified Finnish. We discuss problems in collecting corpora for training models for estimation of text difficulty, and our experiments with estimation models.The results of the experiments are promising: the models appear usable both for assessment and for deployment as a component in a larger simplification framework.
Can Large Language Models Tackle Graph Partitioning?
Yiheng Wu | Ningchao Ge | Yanmin Li | Liwei Qian | Mengna Zhu | Haoyu Yang | Haiwen Chen | Jibing Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yiheng Wu | Ningchao Ge | Yanmin Li | Liwei Qian | Mengna Zhu | Haoyu Yang | Haiwen Chen | Jibing Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) demonstrate remarkable capabilities in understanding complex tasks and have achieved commendable performance in graph-related tasks, such as node classification, link prediction, and subgraph classification. These tasks primarily depend on the local reasoning capabilities of the graph structure. However, research has yet to address the graph partitioning task that requires global perception abilities. Our preliminary findings reveal that vanilla LLMs can only handle graph partitioning on extremely small-scale graphs. To overcome this limitation, we propose a three-phase pipeline to empower LLMs for large-scale graph partitioning: coarsening, reasoning, and refining. The coarsening phase reduces graph complexity. The reasoning phase captures both global and local patterns to generate a coarse partition. The refining phase ensures topological consistency by projecting the coarse-grained partitioning results back to the original graph structure. Extensive experiments demonstrate that our framework enables LLMs to perform graph partitioning across varying graph scales, validating both the effectiveness of LLMs for partitioning tasks and the practical utility of our proposed methodology.