Mark A. Greenwood

Also published as: Mark Greenwood

2016

pdf abs
GATE-Time: Extraction of Temporal Expressions and Events
Leon Derczynski | Jannik Strötgen | Diana Maynard | Mark A. Greenwood | Manuel Jung
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

GATE is a widely used open-source solution for text processing with a large user community. It contains components for several natural language processing tasks. However, temporal information extraction functionality within GATE has been rather limited so far, despite being a prerequisite for many application scenarios in the areas of natural language processing and information retrieval. This paper presents an integrated approach to temporal information processing. We take state-of-the-art tools in temporal expression and event recognition and bring them together to form an openly-available resource within the GATE infrastructure. GATE-Time provides annotation in the form of TimeML events and temporal expressions complying with this mature ISO standard for temporal semantic annotation of documents. Major advantages of GATE-Time are (i) that it relies on HeidelTime for temporal tagging, so that temporal expressions can be extracted and normalized in multiple languages and across different domains, (ii) it includes a modern, fast event recognition and classification tool, and (iii) that it can be combined with different linguistic pre-processing annotations, and is thus not bound to license restricted preprocessing components.

2014

pdf abs
Who cares about Sarcastic Tweets? Investigating the Impact of Sarcasm on Sentiment Analysis.
Diana Maynard | Mark Greenwood
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Sarcasm is a common phenomenon in social media, and is inherently difficult to analyse, not just automatically but often for humans too. It has an important effect on sentiment, but is usually ignored in social media analysis, because it is considered too tricky to handle. While there exist a few systems which can detect sarcasm, almost no work has been carried out on studying the effect that sarcasm has on sentiment in tweets, and on incorporating this into automatic tools for sentiment analysis. We perform an analysis of the effect of sarcasm scope on the polarity of tweets, and have compiled a number of rules which enable us to improve the accuracy of sentiment analysis when sarcasm is known to be present. We consider in particular the effect of sentiment and sarcasm contained in hashtags, and have developed a hashtag tokeniser for GATE, so that sentiment and sarcasm found within hashtags can be detected more easily. According to our experiments, the hashtag tokenisation achieves 98% Precision, while the sarcasm detection achieved 91% Precision and polarity detection 80%.

2013

pdf
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
Kalina Bontcheva | Leon Derczynski | Adam Funk | Mark Greenwood | Diana Maynard | Niraj Aswani
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

2012

pdf abs
Large Scale Semantic Annotation, Indexing and Search at The National Archives
Diana Maynard | Mark A. Greenwood
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes a tool developed to improve access to the enormous volume of data housed at the UK's National Archives, both for the general public and for specialist researchers. The system we have developed, TNA-Search, enables a multi-paradigm search over the entire electronic archive (42TB of data in various formats). The search functionality allows queries that arbitrarily mix any combination of full-text, structural, linguistic and semantic queries. The archive is annotated and indexed with respect to a massive semantic knowledge base containing data from the LOD cloud, data.gov.uk, related TNA projects, and a large geographical database. The semantic annotation component achieves approximately 83% F-measure, which is very reasonable considering the wide range of entities and document types and the open domain. The technologies are being adopted by real users at The National Archives and will form the core of their suite of search tools, with additional in-house interfaces.

2009

pdf
Too Many Mammals: Improving the Diversity of Automatically Recognized Terms
Ziqi Zhang | Lei Xia | Mark A. Greenwood | José Iria
Proceedings of the International Conference RANLP-2009

2008

pdf bib
Coling 2008: Proceedings of the 2nd workshop on Information Retrieval for Question Answering
Mark A. Greenwood
Coling 2008: Proceedings of the 2nd workshop on Information Retrieval for Question Answering

pdf
A Data Driven Approach to Query Expansion in Question Answering
Leon Derczynski | Jun Wang | Robert Gaizauskas | Mark A. Greenwood
Coling 2008: Proceedings of the 2nd workshop on Information Retrieval for Question Answering

pdf
Evaluation of Automatically Reformulated Questions in Question Series
Richard Shaw | Ben Solway | Robert Gaizauskas | Mark A. Greenwood
Coling 2008: Proceedings of the 2nd workshop on Information Retrieval for Question Answering

pdf abs
Saxon: an Extensible Multimedia Annotator
Mark Greenwood | José Iria | Fabio Ciravegna
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper introduces Saxon, a rule-based document annotator that is capable of processing and annotating several document formats and media, both within and across documents. Furthermore, Saxon is readily extensible to support other input formats due to both its flexible rule formalism and the modular plugin architecture of the Runes framework upon which it is built. In this paper we introduce the Saxon rule formalism through examples aimed at highlighting its power and flexibility.

Mark A. Greenwood

2016

2014

2013

2012

2009

2008

2007

2006

2005

Co-authors

Venues