Martin Emms


2016

As time passes words can acquire meanings they did not previously have, such as the ‘twitter post’ usage of ‘tweet’. We address how this can be detected from time-stamped raw text. We propose a generative model with senses dependent on times and context words dependent on senses but otherwise eternal, and a Gibbs sampler for estimation. We obtain promising parameter estimates for positive (resp. negative) cases of known sense emergence (resp non-emergence) and adapt the ‘pseudo-word’ technique (Schutze, 1992) to give a novel further evaluation via ‘pseudo-neologisms’. The question of ground-truth is also addressed and a technique proposed to locate an emergence date for evaluation purposes.

2015

2014

2011

2009

2008

Some alternatives to the standard evalb measures for parser evaluation are considered, principally the use of a tree-distance measure, which assigns a score to a linearity and ancestry respecting mapping between trees, in contrast to the evalb measures, which assign a score to a span preserving mapping. Additionally, analysis of the evalb measures suggests some further variants, concerning different normalisations, the portions of a tree compared and whether scores should be micro or macro averaged. The outputs of 6 parsing systems on Section 23 of the Penn Treebank were taken. It is shown that the ranking of the parsing systems varies as the alternative evaluation measures are used. For a fixed parsing system, it is also shown that the ranking of the parses from best to worst will vary according to whether the evalb or tree-distance measure is used. It is argued that the tree-distance measure ameliorates a problem that has been noted concerning over-penalisation of attachment errors.

2006

1993