Tatyana Shmanina


A Corpus of Tables in Full-Text Biomedical Research Publications
Tatyana Shmanina | Ingrid Zukerman | Ai Lee Cheam | Thomas Bochynek | Lawrence Cavedon
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)

The development of text mining techniques for biomedical research literature has received increased attention in recent times. However, most of these techniques focus on prose, while much important biomedical data reside in tables. In this paper, we present a corpus created to serve as a gold standard for the development and evaluation of techniques for the automatic extraction of information from biomedical tables. We describe the guidelines used for corpus annotation and the manner in which they were developed. The high inter-annotator agreement achieved on the corpus, and the generic nature of our annotation approach, suggest that the developed guidelines can serve as a general framework for table annotation in biomedical and other scientific domains. The annotated corpus and the guidelines are available at http://www.csse.monash.edu.au/research/umnl/data/index.shtml.


Challenges in Information Extraction from Tables in Biomedical Research Publications: a Dataset Analysis
Tatyana Shmanina | Lawrence Cavedon | Ingrid Zukerman
Proceedings of the Australasian Language Technology Association Workshop 2014


Impact of Corpus Diversity and Complexity on NER Performance
Tatyana Shmanina | Ingrid Zukerman | Antonio Jimeno Yepes | Lawrence Cavedon | Karin Verspoor
Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013)