Financial Numeric Extreme Labelling: A dataset and benchmarking
Soumya Sharma, Subhendu Khatuya, Manjunath Hegde, Afreen Shaikh, Koustuv Dasgupta, Pawan Goyal, Niloy Ganguly
Abstract
The U.S. Securities and Exchange Commission (SEC) mandates all public companies to file periodic financial statements that should contain numerals annotated with a particular label from a taxonomy. In this paper, we formulate the task of automating the assignment of a label to a particular numeral span in a sentence from an extremely large label set. Towards this task, we release a dataset, Financial Numeric Extreme Labelling (FNXL), annotated with 2,794 labels. We benchmark the performance of the FNXL dataset by formulating the task as (a) a sequence labelling problem and (b) a pipeline with span extraction followed by Extreme Classification. Although the two approaches perform comparably, the pipeline solution provides a slight edge for the least frequent labels.- Anthology ID:
- 2023.findings-acl.219
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2023
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3550–3561
- Language:
- URL:
- https://aclanthology.org/2023.findings-acl.219
- DOI:
- 10.18653/v1/2023.findings-acl.219
- Cite (ACL):
- Soumya Sharma, Subhendu Khatuya, Manjunath Hegde, Afreen Shaikh, Koustuv Dasgupta, Pawan Goyal, and Niloy Ganguly. 2023. Financial Numeric Extreme Labelling: A dataset and benchmarking. In Findings of the Association for Computational Linguistics: ACL 2023, pages 3550–3561, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Financial Numeric Extreme Labelling: A dataset and benchmarking (Sharma et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/2023.findings-acl.219.pdf