FIRE: A Dataset for Financial Relation Extraction
Hassan Hamad, Abhinav Kumar Thakur, Nijil Kolleri, Sujith Pulikodan, Keith Chugg
Abstract
This paper introduces FIRE (**FI**nancial **R**elation **E**xtraction), a sentence-level dataset of named entities and relations within the financial sector. Comprising 3,025 instances, the dataset encapsulates 13 named entity types along with 18 relation types. Sourced from public financial reports and financial news articles, FIRE captures a wide array of financial information about a business including, but not limited to, corporate structure, business model, revenue streams, and market activities such as acquisitions. The full dataset was labeled by a single annotator to minimize labeling noise. The labeling time for each sentence was recorded during the labeling process. We show how this feature, along with curriculum learning techniques, can be used to improved a model’s performance. The FIRE dataset is designed to serve as a valuable resource for training and evaluating machine learning algorithms in the domain of financial information extraction. The dataset and the code to reproduce our experimental results are available at https://github.com/hmhamad/FIRE. The repository for the labeling tool can be found at https://github.com/abhinav-kumar-thakur/relation-extraction-annotator.- Anthology ID:
- 2024.findings-naacl.230
- Volume:
- Findings of the Association for Computational Linguistics: NAACL 2024
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Kevin Duh, Helena Gomez, Steven Bethard
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3628–3642
- Language:
- URL:
- https://aclanthology.org/2024.findings-naacl.230
- DOI:
- Cite (ACL):
- Hassan Hamad, Abhinav Kumar Thakur, Nijil Kolleri, Sujith Pulikodan, and Keith Chugg. 2024. FIRE: A Dataset for Financial Relation Extraction. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 3628–3642, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- FIRE: A Dataset for Financial Relation Extraction (Hamad et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2024.findings-naacl.230.pdf