Florian Faust
2025
Integrating Expert Labels into LLM-based Emission Goal Detection: Example Selection vs Automatic Prompt Design
Marco Wrzalik
|
Adrian Ulges
|
Anne Uersfeld
|
Florian Faust
|
Viola Campos
Proceedings of the 2nd Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2025)
We address the detection of emission reduction goals in corporate reports, an important task for monitoring companies’ progress in addressing climate change. Specifically, we focus on the issue of integrating expert feedback in the form of labeled example passages into LLM-based pipelines, and compare the two strategies of (1) a dynamic selection of few-shot examples and (2) the automatic optimization of the prompt by the LLM itself. Our findings on a public dataset of 769 climate-related passages from real-world business reports indicate that automatic prompt optimization is the superior approach, while combining both methods provides only limited benefit. Qualitative results indicate that optimized prompts do indeed capture many intricacies of the targeted emission goal extraction task.
2024
NetZeroFacts: Two-Stage Emission Information Extraction from Company Reports
Marco Wrzalik
|
Florian Faust
|
Simon Sieber
|
Adrian Ulges
Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing
We address the challenge of efficiently extracting structured emission information, specifically emission goals, from company reports. Leveraging the potential of Large Language Models (LLMs), we propose a two-stage pipeline that first filters and retrieves potentially relevant passages and then extracts structured information from them using a generative model. We contribute an annotated dataset covering over 14.000 text passages, from which we extracted 739 expert annotated facts. On this dataset, we investigate the accuracy, efficiency and limitations of LLM-based emission information extraction, evaluate different retrieval techniques, and assess efficiency gains for human analysts by using the proposed pipeline. Our research demonstrates the promise of LLM technology in addressing the intricate task of sustainable emission data extraction from company reports.