Subhendu Khatuya
2023
Financial Numeric Extreme Labelling: A dataset and benchmarking
Soumya Sharma
|
Subhendu Khatuya
|
Manjunath Hegde
|
Afreen Shaikh
|
Koustuv Dasgupta
|
Pawan Goyal
|
Niloy Ganguly
Findings of the Association for Computational Linguistics: ACL 2023
The U.S. Securities and Exchange Commission (SEC) mandates all public companies to file periodic financial statements that should contain numerals annotated with a particular label from a taxonomy. In this paper, we formulate the task of automating the assignment of a label to a particular numeral span in a sentence from an extremely large label set. Towards this task, we release a dataset, Financial Numeric Extreme Labelling (FNXL), annotated with 2,794 labels. We benchmark the performance of the FNXL dataset by formulating the task as (a) a sequence labelling problem and (b) a pipeline with span extraction followed by Extreme Classification. Although the two approaches perform comparably, the pipeline solution provides a slight edge for the least frequent labels.
Search
Co-authors
- Soumya Sharma 1
- Manjunath Hegde 1
- Afreen Shaikh 1
- Koustuv Dasgupta 1
- Pawan Goyal 1
- show all...