Lightweight Contextual Logical Structure Recovery
Po-Wei Huang, Abhinav Ramesh Kashyap, Yanxia Qin, Yajing Yang, Min-Yen Kan
Abstract
Logical structure recovery in scientific articles associates text with a semantic section of the article. Although previous work has disregarded the surrounding context of a line, we model this important information by employing line-level attention on top of a transformer-based scientific document processing pipeline. With the addition of loss function engineering and data augmentation techniques with semi-supervised learning, our method improves classification performance by 10% compared to a recent state-of-the-art model. Our parsimonious, text-only method achieves a performance comparable to that of other works that use rich document features such as font and spatial position, using less data without sacrificing performance, resulting in a lightweight training pipeline.- Anthology ID:
- 2022.sdp-1.5
- Volume:
- Proceedings of the Third Workshop on Scholarly Document Processing
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Venue:
- sdp
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 37–48
- Language:
- URL:
- https://aclanthology.org/2022.sdp-1.5
- DOI:
- Cite (ACL):
- Po-Wei Huang, Abhinav Ramesh Kashyap, Yanxia Qin, Yajing Yang, and Min-Yen Kan. 2022. Lightweight Contextual Logical Structure Recovery. In Proceedings of the Third Workshop on Scholarly Document Processing, pages 37–48, Gyeongju, Republic of Korea. Association for Computational Linguistics.
- Cite (Informal):
- Lightweight Contextual Logical Structure Recovery (Huang et al., sdp 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.sdp-1.5.pdf