Abstract
We present our contributions for the 2020 FinTOC Shared Tasks: Title Detection and Table of Contents Extraction. For the Structure Extraction task, we propose an approach that combines information from multiple sources: the table of contents, the wording of the document, and lexical domain knowledge. For the title detection task, we compare surface features to character-based features on various training configurations. We show that title detection results are very sensitive to the kind of training dataset used.- Anthology ID:
- 2020.fnp-1.30
- Volume:
- Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Venue:
- FNP
- SIG:
- Publisher:
- COLING
- Note:
- Pages:
- 174–180
- Language:
- URL:
- https://aclanthology.org/2020.fnp-1.30
- DOI:
- Cite (ACL):
- Emmanuel Giguet, Gaël Lejeune, and Jean-Baptiste Tanguy. 2020. Daniel@FinTOC’2 Shared Task: Title Detection and Structure Extraction. In Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation, pages 174–180, Barcelona, Spain (Online). COLING.
- Cite (Informal):
- Daniel@FinTOC’2 Shared Task: Title Detection and Structure Extraction (Giguet et al., FNP 2020)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2020.fnp-1.30.pdf