Maja Rey


2021

pdf
Word-Level Alignment of Paper Documents with their Electronic Full-Text Counterparts
Mark-Christoph Müller | Sucheta Ghosh | Ulrike Wittig | Maja Rey
Proceedings of the 20th Workshop on Biomedical Language Processing

We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score of 85.01 in the basic setup and up to 86.63 when using pre- and post-processing. Potential areas of application are manual database curation (incl. document triage) and biomedical expression OCR.

2020

pdf
Reconstructing Manual Information Extraction with DB-to-Document Backprojection: Experiments in the Life Science Domain
Mark-Christoph Müller | Sucheta Ghosh | Maja Rey | Ulrike Wittig | Wolfgang Müller | Michael Strube
Proceedings of the First Workshop on Scholarly Document Processing

We introduce a novel scientific document processing task for making previously inaccessible information in printed paper documents available to automatic processing. We describe our data set of scanned documents and data records from the biological database SABIO-RK, provide a definition of the task, and report findings from preliminary experiments. Rigorous evaluation proved challenging due to lack of gold-standard data and a difficult notion of correctness. Qualitative inspection of results, however, showed the feasibility and usefulness of the task