ISPRAS@FinTOC-2022 Shared Task: Two-stage TOC Generation Model
Anastasiia Bogatenkova, Oksana Vladimirovna Belyaeva, Andrew Igorevich Perminov, Ilya Sergeevich Kozlov
Abstract
This work is connected with participation in FinTOC-2022 Shared Task: “Financial Document Structure Extraction”. The competition contains two subtasks: title detection and TOC generation. We describe an approach for solving these tasks and propose the pipeline, consisting of extraction of document lines and existing TOC, feature matrix forming and classification. Classification model consists of two classifiers: the first binary classifier separates title lines from non-title, the second one determines the title level. In the title detection task, we got 0.900, 0.778 and 0.558 F1 measure, in the TOC generation task we got 63.1, 41.5 and 40.79 the harmonic mean of Inex F1 score and Inex level accuracy for English, French and Spanish documents respectively. With these results, our approach took first place among English and French submissions and second place among Spanish submissions. As a team, we took first place in the competition in English and French categories and second place in the competition in Spanish.- Anthology ID:
- 2022.fnp-1.13
- Volume:
- Proceedings of the 4th Financial Narrative Processing Workshop @LREC2022
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Mahmoud El-Haj, Paul Rayson, Nadhem Zmandar
- Venue:
- FNP
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 89–94
- Language:
- URL:
- https://aclanthology.org/2022.fnp-1.13
- DOI:
- Cite (ACL):
- Anastasiia Bogatenkova, Oksana Vladimirovna Belyaeva, Andrew Igorevich Perminov, and Ilya Sergeevich Kozlov. 2022. ISPRAS@FinTOC-2022 Shared Task: Two-stage TOC Generation Model. In Proceedings of the 4th Financial Narrative Processing Workshop @LREC2022, pages 89–94, Marseille, France. European Language Resources Association.
- Cite (Informal):
- ISPRAS@FinTOC-2022 Shared Task: Two-stage TOC Generation Model (Bogatenkova et al., FNP 2022)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2022.fnp-1.13.pdf