Robust Table Information Extraction from Sustainability Reports: A Time-Aware Hybrid Two-Step Approach

Hendrik Weichel, Martin Simon, Jörg Schäfer


Abstract
The extraction of emissions-related information from annual reports has become increasingly important due to the Corporate Sustainability Reporting Directive (CSRD), which mandates greater transparency in sustainability reporting. As a result, information extraction (IE) methods must be robust, ensuring accurate retrieval while minimizing false values. While large language models (LLMs) offer potential for this task, their black-box nature and lack of specialization in table structures limit their robustness – an essential requirement in risk-averse domains. In this work, we present a two-step hybrid approach which optimizes both accuracy and robustness. More precisely, we combine a rule-based step for table IE with a regularized LLM-based step, both leveraging temporal prior knowledge. Our tests demonstrate the advantages of combining structured rules with LLMs. Furthermore, the modular design of our method allows for flexible adaptation to various IE tasks, making it a practical solution for industry applications while also serving as a scalable assistive tool for information extraction.
Anthology ID:
2025.climatenlp-1.16
Volume:
Proceedings of the 2nd Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2025)
Month:
July
Year:
2025
Address:
Bangkok, Thailand
Editors:
Kalyan Dutia, Peter Henderson, Markus Leippold, Christoper Manning, Gaku Morio, Veruska Muccione, Jingwei Ni, Tobias Schimanski, Dominik Stammbach, Alok Singh, Alba (Ruiran) Su, Saeid A. Vaghefi
Venues:
ClimateNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
233–244
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.climatenlp-1.16/
DOI:
Bibkey:
Cite (ACL):
Hendrik Weichel, Martin Simon, and Jörg Schäfer. 2025. Robust Table Information Extraction from Sustainability Reports: A Time-Aware Hybrid Two-Step Approach. In Proceedings of the 2nd Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2025), pages 233–244, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Robust Table Information Extraction from Sustainability Reports: A Time-Aware Hybrid Two-Step Approach (Weichel et al., ClimateNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.climatenlp-1.16.pdf