Word-Level Alignment of Paper Documents with their Electronic Full-Text Counterparts
Mark-Christoph Müller, Sucheta Ghosh, Ulrike Wittig, Maja Rey
Abstract
We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score of 85.01 in the basic setup and up to 86.63 when using pre- and post-processing. Potential areas of application are manual database curation (incl. document triage) and biomedical expression OCR.- Anthology ID:
- 2021.bionlp-1.19
- Volume:
- Proceedings of the 20th Workshop on Biomedical Language Processing
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Venue:
- BioNLP
- SIG:
- SIGBIOMED
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 168–179
- Language:
- URL:
- https://aclanthology.org/2021.bionlp-1.19
- DOI:
- 10.18653/v1/2021.bionlp-1.19
- Cite (ACL):
- Mark-Christoph Müller, Sucheta Ghosh, Ulrike Wittig, and Maja Rey. 2021. Word-Level Alignment of Paper Documents with their Electronic Full-Text Counterparts. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 168–179, Online. Association for Computational Linguistics.
- Cite (Informal):
- Word-Level Alignment of Paper Documents with their Electronic Full-Text Counterparts (Müller et al., BioNLP 2021)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2021.bionlp-1.19.pdf
- Code
- nlpAThits/BioNLP2021