2010
pdf
abs
The CALBC Silver Standard Corpus for Biomedical Named Entities — A Study in Harmonizing the Contributions from Four Independent Named Entity Taggers
Dietrich Rebholz-Schuhmann
|
Antonio José Jimeno Yepes
|
Erik M. van Mulligen
|
Ning Kang
|
Jan Kors
|
David Milward
|
Peter Corbett
|
Ekaterina Buyko
|
Katrin Tomanek
|
Elena Beisswanger
|
Udo Hahn
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
The production of gold standard corpora is time-consuming and costly. We propose an alternative: the âsilver standard corpus (SSC), a corpus that has been generated by the harmonisation of the annotations that have been delivered from a selection of annotation systems. The systems have to share the type system for the annotations and the harmonisation solution has use a suitable similarity measure for the pair-wise comparison of the annotations. The annotation systems have been evaluated against the harmonised set (630.324 sentences, 15,956,841 tokens). We can demonstrate that the annotation of proteins and genes shows higher diversity across all used annotation solutions leading to a lower agreement against the harmonised set in comparison to the annotations of diseases and species. An analysis of the most frequent annotations from all systems shows that a high agreement amongst systems leads to the selection of terms that are suitable to be kept in the harmonised set. This is the first large-scale approach to generate an annotated corpus from automated annotation systems. Further research is required to understand, how the annotations from different systems have to be combined to produce the best annotation result for a harmonised corpus.
2008
pdf
Cascaded Classifiers for Confidence-Based Chemical Named Entity Recognition
Peter Corbett
|
Ann Copestake
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
pdf
abs
Language Resources and Chemical Informatics
C.J. Rupp
|
Ann Copestake
|
Peter Corbett
|
Peter Murray-Rust
|
Advaith Siddharthan
|
Simone Teufel
|
Benjamin Waldron
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Chemistry research papers are a primary source of information about chemistry, as in any scientific field. The presentation of the data is, predominantly, unstructured information, and so not immediately susceptible to processes developed within chemical informatics for carrying out chemistry research by information processing techniques. At one level, extracting the relevant information from research papers is a text mining task, requiring both extensive language resources and specialised knowledge of the subject domain. However, the papers also encode information about the way the research is conducted and the structure of the field itself. Applying language technology to research papers in chemistry can facilitate eScience on several different levels. The SciBorg project sets out to provide an extensive, analysed corpus of published chemistry research. This relies on the cooperation of several journal publishers to provide papers in an appropriate form. The work is carried out as a collaboration involving the Computer Laboratory, Chemistry Department and eScience Centre at Cambridge University, and is funded under the UK eScience programme.
2007
pdf
Semantic enrichment of journal articles using chemical named entity recognition
Colin R. Batchelor
|
Peter T. Corbett
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions
pdf
Annotation of Chemical Named Entities
Peter Corbett
|
Colin Batchelor
|
Simone Teufel
Biological, translational, and clinical language processing