Richard Fothergill


2016

pdf
Evaluating a Topic Modelling Approach to Measuring Corpus Similarity
Richard Fothergill | Paul Cook | Timothy Baldwin
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Web corpora are often constructed automatically, and their contents are therefore often not well understood. One technique for assessing the composition of such a web corpus is to empirically measure its similarity to a reference corpus whose composition is known. In this paper we evaluate a number of measures of corpus similarity, including a method based on topic modelling which has not been previously evaluated for this task. To evaluate these methods we use known-similarity corpora that have been previously used for this purpose, as well as a number of newly-constructed known-similarity corpora targeting differences in genre, topic, time, and region. Our findings indicate that, overall, the topic modelling approach did not improve on a chi-square method that had previously been found to work well for measuring corpus similarity.

2015

pdf
RoseMerry: A Baseline Message-level Sentiment Classification System
Huizhi Liang | Richard Fothergill | Timothy Baldwin
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2012

pdf
Combining resources for MWE-token classification
Richard Fothergill | Timothy Baldwin
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

pdf
Fleshing it out: A Supervised Approach to MWE-token and MWE-type Classification
Richard Fothergill | Timothy Baldwin
Proceedings of 5th International Joint Conference on Natural Language Processing