Yiheng Wu


2026

We present EpiGator, a novel event-based system for global surveillance of outbreaks of infectious epidemics that automatically processes streams of news articles and generates reports about the outbreaks, which is crucial for medical authorities. The goal of our work is to combine our experience in outbreak surveillance with state-of-the-art large language models (LLM), which allows us to reduce the overall cost of system development and maintenance. The EpiGator pipeline combines keyword filtering, relevance classification, event-based clustering, and multi-document summarization. A key novelty lies in using a fine-tuned LLM to identify articles relevant to ongoing outbreaks, followed by a zero-shot information extraction pipeline that normalizes the event features and clusters the related articles. For each cluster, we generate an outbreak summary using instruction-tuned LLMs. We evaluate EpiGator output against disease outbreak reports written by medical specialists.

2025

Text simplification is an active research topic with applications in multiple domains. In a simplification pipeline, assessment of text difficulty plays a crucial role as a quality control mechanism it acts as a critic and guides models to generate text at the difficulty level required by the user. This paper presents our Difficulty-aware Text Simplification System. We evaluate our pipeline using the TSAR shared task dataset and discuss challenges in constructing corpora for training models to assess text difficulty.
Easy language and text simplification are currently topical research questions, with important applications in many contexts, and with various approaches under active investigation, including prompt-based methods. The estimation of the level of difficulty of a text becomes a crucial challenge when the estimator is employed in a simplification workflow as a quality-control mechanism. It can act as a critic in frameworks where it can guide other models, which are responsible for generating text at a specified level of difficulty, as determined by the user’s needs.We present our work in the context of simplified Finnish. We discuss problems in collecting corpora for training models for estimation of text difficulty, and our experiments with estimation models.The results of the experiments are promising: the models appear usable both for assessment and for deployment as a component in a larger simplification framework.
Large language models (LLMs) demonstrate remarkable capabilities in understanding complex tasks and have achieved commendable performance in graph-related tasks, such as node classification, link prediction, and subgraph classification. These tasks primarily depend on the local reasoning capabilities of the graph structure. However, research has yet to address the graph partitioning task that requires global perception abilities. Our preliminary findings reveal that vanilla LLMs can only handle graph partitioning on extremely small-scale graphs. To overcome this limitation, we propose a three-phase pipeline to empower LLMs for large-scale graph partitioning: coarsening, reasoning, and refining. The coarsening phase reduces graph complexity. The reasoning phase captures both global and local patterns to generate a coarse partition. The refining phase ensures topological consistency by projecting the coarse-grained partitioning results back to the original graph structure. Extensive experiments demonstrate that our framework enables LLMs to perform graph partitioning across varying graph scales, validating both the effectiveness of LLMs for partitioning tasks and the practical utility of our proposed methodology.