Sucheta Ghosh


Word-Level Alignment of Paper Documents with their Electronic Full-Text Counterparts
Mark-Christoph Müller | Sucheta Ghosh | Ulrike Wittig | Maja Rey
Proceedings of the 20th Workshop on Biomedical Language Processing

We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score of 85.01 in the basic setup and up to 86.63 when using pre- and post-processing. Potential areas of application are manual database curation (incl. document triage) and biomedical expression OCR.


Reconstructing Manual Information Extraction with DB-to-Document Backprojection: Experiments in the Life Science Domain
Mark-Christoph Müller | Sucheta Ghosh | Maja Rey | Ulrike Wittig | Wolfgang Müller | Michael Strube
Proceedings of the First Workshop on Scholarly Document Processing

We introduce a novel scientific document processing task for making previously inaccessible information in printed paper documents available to automatic processing. We describe our data set of scanned documents and data records from the biological database SABIO-RK, provide a definition of the task, and report findings from preliminary experiments. Rigorous evaluation proved challenging due to lack of gold-standard data and a difficult notion of correctness. Qualitative inspection of results, however, showed the feasibility and usefulness of the task


Mining Fine-grained Opinion Expressions with Shallow Parsing
Sucheta Ghosh | Sara Tonelli | Richard Johansson
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013


Improving the Recall of a Discourse Parser by Constraint-based Postprocessing
Sucheta Ghosh | Richard Johansson | Giuseppe Riccardi | Sara Tonelli
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We describe two constraint-based methods that can be used to improve the recall of a shallow discourse parser based on conditional random field chunking. These method uses a set of natural structural constraints as well as others that follow from the annotation guidelines of the Penn Discourse Treebank. We evaluated the resulting systems on the standard test set of the PDTB and achieved a rebalancing of precision and recall with improved F-measures across the board. This was especially notable when we used evaluation metrics taking partial matches into account; for these measures, we achieved F-measure improvements of several points.

Global Features for Shallow Discourse Parsing
Sucheta Ghosh | Giuseppe Riccardi | Richard Johansson
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue


Shallow Discourse Parsing with Conditional Random Fields
Sucheta Ghosh | Richard Johansson | Giuseppe Riccardi | Sara Tonelli
Proceedings of 5th International Joint Conference on Natural Language Processing