Robert Dale


2012

Computer-based aids for writing assistance have been around since at least the early 1980s, focussing primarily on aspects such as spelling, grammar and style. The potential audience for such tools is very large indeed, and this is a clear case where we might expect to see language processing applications having a significant real-world impact. However, existing comparative evaluations of applications in this space are often no more than impressionistic and anecdotal reviews of commercial offerings as found in software magazines, making it hard to determine which approaches are superior. More rigorous evaluation in the scholarly literature has been held back in particular by the absence of shared datasets of texts marked-up with errors, and the lack of an agreed evaluation framework. Significant collections of publicly available data are now appearing; this paper describes a complementary evaluation framework, which has been piloted in the Helping Our Own shared task. The approach, which uses stand-off annotations for representing edits to text, can be used in a wide variety of text-correction tasks, and easily accommodates different error tagsets.

2011

2010

A central purpose of referring expressions is to distinguish intended referents from other entities that are in the context; but how is this context determined? This paper draws a distinction between discourse context ―other entities that have been mentioned in the dialogue― and visual context ―visually available objects near the intended referent. It explores how these two different aspects of context have an impact on subsequent reference in a dialogic situation where the speakers share both discourse and visual context. In addition we take into account the impact of the reference history ―forms of reference used previously in the discourse― on forming what have been called conceptual pacts. By comparing the output of different parameter settings in our model to a data set of human-produced referring expressions, we determine that an approach to subsequent reference based on conceptual pacts provides a better explanation of our data than previously proposed algorithmic approaches which compute a new distinguishing description for the intended referent every time it is mentioned.

2009

2008

The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of research results, but we believe that it can also be an object of study and a platform for research in its own right. We describe an enriched and standardized reference corpus derived from the ACL Anthology that can be used for research in scholarly document processing. This corpus, which we call the ACL Anthology Reference Corpus (ACL ARC), brings together the recent activities of a number of research groups around the world. Our goal is to make the corpus widely available, and to encourage other researchers to use it as a standard testbed for experiments in both bibliographic and bibliometric research.
Krahmer et al.’s (2003) graph-based framework provides an elegant and flexible approach to the generation of referring expressions. In this paper, we present the first reported study that systematically investigates how to tune the parameters of the graph-based framework on the basis of a corpus of human-generated descriptions. We focus in particular on replicating the redundant nature of human referring expressions, whereby properties not strictly necessary for identifying a referent are nonetheless included in descriptions. We show how statistics derived from the corpus data can be integrated to boost the framework’s performance over a non-stochastic baseline.

2007

2006

The recognition of named entities is now a well-developed area, with a range of symbolic and machine learning techniques that deliver high accuracy extraction and categorisation of a variety of entity types. However, there are still some named entity phenomena that present problems for existing techniques; in particular, relatively little work has explored the disambiguation of conjunctions appearing in candidate named entity strings. We demonstrate that there are in fact four distinct uses of conjunctions in the context of named entities; we present some experiments using machine-learned classifiers to disambiguate the different uses of the conjunction, with 85% of test examples being correctly classified.

2005

2004

2003

2002

2000

1998

1996

1993

1992

1991

1989