Yucheng Lu
2025
Can Reasoning LLMs Synthesize Complex Climate Statements?
Yucheng Lu
Proceedings of the 2nd Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2025)
Accurately synthesizing climate evidence into concise statements is crucial for policy making and fostering public trust in climate science. Recent advancements in Large Language Models (LLMs), particularly the emergence of reasoning-optimized variants, which excel at mathematical and logical tasks, present a promising yet untested opportunity for scientific evidence synthesis. We evaluate state-of-the-art reasoning LLMs on two key tasks: (1) *contextual confidence classification*, assigning appropriate confidence levels to climate statements based on evidence, and (2) *factual summarization of climate evidence*, generating concise summaries evaluated for coherence, faithfulness, and similarity to expert-written versions. Using a novel dataset of 612 structured examples constructed from the Sixth Assessment Report (AR6) of the Intergovernmental Panel on Climate Change (IPCC), we find reasoning LLMs outperform general-purpose models in confidence classification by 8 percentage points in accuracy and macro-F1 scores. However, for summarization tasks, performance differences between model types are mixed. Our findings demonstrate that reasoning LLMs show promise as auxiliary tools for confidence assessment in climate evidence synthesis, while highlighting significant limitations in their direct application to climate evidence summarization. This work establishes a foundation for future research on the targeted integration of LLMs into scientific assessment workflows.
Tracking Green Industrial Policies with LLMs: A Demonstration
Yucheng Lu
Proceedings of the Fourth Workshop on NLP for Positive Impact (NLP4PI)
Green industrial policies (GIPs) are government interventions that support environmentally sustainable economic growth through targeted incentives, regulations, and investments in clean technologies. As the backbone of climate mitigation and adaptation, GIPs deserve systematic documentation and analysis. However, two major hurdles impede this systematic documentation. First, unlike other climate policy documents, such as Nationally Determined Contributions (NDCs) which are centrally curated, GIPs are scattered across numerous government legislation and policy announcements. Second, extracting information from these diverse documents is expensive when relying on expert annotation. We address this gap by proposing GreenSpyder, an LLM-based workflow that actively monitors, classifies, and annotates GIPs from open-source information. As a demonstration, we benchmark LLM performance in classifying and annotating GIPs on a small expert-curated dataset. Our results show that LLMs can be quite effective for classification and coarse annotation tasks, though they still need improvement for more nuanced classification. Finally, as a real-world application, we apply GreenSpyder to U.S. Legislative Records from the 117th Congress, paving the way for more comprehensive LLM-based GIP documentation in the future.