Neil Newbold


2010

pdf
The Linguistics of Readability: The Next Step for Word Processing
Neil Newbold | Lee Gillam
Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics and Writing: Writing Processes and Authoring Aids

2008

pdf
Lexical Ontology Extraction using Terminology Analysis: Automating Video Annotation
Neil Newbold | Bogdan Vrusias | Lee Gillam
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The majority of work described in this paper was conducted as part of the Recovering Evidence from Video by fusing Video Evidence Thesaurus and Video MetaData (REVEAL) project, sponsored by the UK’s Engineering and Physical Sciences Research Council (EPSRC). REVEAL is concerned with reducing the time-consuming, yet essential, tasks undertaken by UK Police Officers when dealing with terascale collections of video related to crime-scenes. The project is working towards technologies which will archive video that has been annotated automatically based on prior annotations of similar content, enabling rapid access to CCTV archives and providing capabilities for automatic video summarisation. This involves considerations of semantic annotation relating, amongst other things, to content and to temporal reasoning. In this paper, we describe the ontology extraction components of the system in development, and its use in REVEAL for automatically populating a CCTV ontology from analysis of expert transcripts of the video footage.

pdf
Automatic Document Quality Control
Neil Newbold | Lee Gillam
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper focuses on automatically improving the readability of documents. We explore mechanisms relating to content control that could be used (i) by authors to improve the quality and consistency of the language used in authoring; and (ii) to find a means to demonstrate this to readers. To achieve this, we implemented and evaluated a number of software components, including those of the University of Surrey Department of Computing’s content analysis applications (System Quirk). The software integrates these components within the commonly available GATE software and incorporates language resources considered useful within the standards development process: a Plain English thesaurus; lookup of ISO terminology provided from a terminology management system (TMS) via ISO 16642; automatic terminology discovery using statistical and linguistic techniques; and readability metrics. Results lead us to the development of an assistive tool, initially for authors of standards but not considered to be limited only to such authors, and also to a system that provides automatic annotation of texts to help readers to understand them. We describe the system developed and made freely available under the auspices of the EU eContent project LIRICS.