Zehua Li


2025

Regulatory agencies often operate with limited resources and rely on tips from the public to identify potential violations. However, processing these tips at scale presents significant operational challenges, as agencies must correctly identify and route relevant tips to the appropriate enforcement divisions. Through a case study, we demonstrate how advances in large language models can be utilized to support overburdened agencies with limited capacities. In partnership with the U.S. Environmental Protection Agency, we leverage previously unstudied citizen tips data from their “Report a Violation” system to develop an LLM-assisted pipeline for tip routing. Our approach filters out 80.5% of irrelevant tips and increases overall routing accuracy from 31.8% to 82.4% compared to the current routing system. At a time of increased focus on government efficiencies, our approach provides a constructive path forward by using technology to empower civil servants.

2021

In this paper, we explore the construction of natural language explanations for news claims, with the goal of assisting fact-checking and news evaluation applications. We experiment with two methods: (1) an extractive method based on Biased TextRank – a resource-effective unsupervised graph-based algorithm for content extraction; and (2) an abstractive method based on the GPT-2 language model. We perform comparative evaluations on two misinformation datasets in the political and health news domains, and find that the extractive method shows the most promise.