Abstract
In this paper, we discuss our ongoing efforts to construct a scientific paper browsing system that helps users to read and understand advanced technical content distributed in PDF. Since PDF is a format specifically designed for printing, layout and logical structures of documents are indistinguishably embedded in the file. It requires much effort to extract natural language text from PDF files, and reversely, display semantic annotations produced by NLP tools on the original page layout. In our browsing system, we tackle these issues caused by the gap between printable document and plain text. Our system provides ways to extract natural language sentences from PDF files together with their logical structures, and also to map arbitrary textual spans to their corresponding regions on page images. We setup a demonstration system using papers published in ACL anthology and demonstrate the enhanced search and refined recommendation functions which we plan to make widely available to NLP researchers.- Anthology ID:
- C16-2029
- Volume:
- Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Editor:
- Hideo Watanabe
- Venue:
- COLING
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 136–140
- Language:
- URL:
- https://aclanthology.org/C16-2029
- DOI:
- Cite (ACL):
- Takeshi Abekawa and Akiko Aizawa. 2016. SideNoter: Scholarly Paper Browsing System based on PDF Restructuring and Text Annotation. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations, pages 136–140, Osaka, Japan. The COLING 2016 Organizing Committee.
- Cite (Informal):
- SideNoter: Scholarly Paper Browsing System based on PDF Restructuring and Text Annotation (Abekawa & Aizawa, COLING 2016)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/C16-2029.pdf