Lee Gillam


pdf bib
I Don’t Know Where He is Not”: Does Deception Research yet Offer a Basis for Deception Detectives?
Anna Vartapetiance | Lee Gillam
Proceedings of the Workshop on Computational Approaches to Deception Detection


The Linguistics of Readability: The Next Step for Word Processing
Neil Newbold | Lee Gillam
Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics and Writing: Writing Processes and Authoring Aids


Lexical Ontology Extraction using Terminology Analysis: Automating Video Annotation
Neil Newbold | Bogdan Vrusias | Lee Gillam
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The majority of work described in this paper was conducted as part of the Recovering Evidence from Video by fusing Video Evidence Thesaurus and Video MetaData (REVEAL) project, sponsored by the UK’s Engineering and Physical Sciences Research Council (EPSRC). REVEAL is concerned with reducing the time-consuming, yet essential, tasks undertaken by UK Police Officers when dealing with terascale collections of video related to crime-scenes. The project is working towards technologies which will archive video that has been annotated automatically based on prior annotations of similar content, enabling rapid access to CCTV archives and providing capabilities for automatic video summarisation. This involves considerations of semantic annotation relating, amongst other things, to content and to temporal reasoning. In this paper, we describe the ontology extraction components of the system in development, and its use in REVEAL for automatically populating a CCTV ontology from analysis of expert transcripts of the video footage.

Automatic Document Quality Control
Neil Newbold | Lee Gillam
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper focuses on automatically improving the readability of documents. We explore mechanisms relating to content control that could be used (i) by authors to improve the quality and consistency of the language used in authoring; and (ii) to find a means to demonstrate this to readers. To achieve this, we implemented and evaluated a number of software components, including those of the University of Surrey Department of Computing’s content analysis applications (System Quirk). The software integrates these components within the commonly available GATE software and incorporates language resources considered useful within the standards development process: a Plain English thesaurus; lookup of ISO terminology provided from a terminology management system (TMS) via ISO 16642; automatic terminology discovery using statistical and linguistic techniques; and readability metrics. Results lead us to the development of an assistive tool, initially for authors of standards but not considered to be limited only to such authors, and also to a system that provides automatic annotation of texts to help readers to understand them. We describe the system developed and made freely available under the auspices of the EU eContent project LIRICS.


Sentiments on a Grid: Analysis of Streaming News and Views
Khurshid Ahmad | Lee Gillam | David Cheng
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we report on constructing a finite state automaton comprising automatically extracted terminology and significant collocation patterns from a training corpus of specialist news (Reuters Financial News). The automaton can be used to unambiguously identify sentiment-bearing words that might be able to make or break people, companies, perhaps even governments. The paper presents the emerging face of corpus linguistics where a corpus is used to bootstrap both the terminology and the significant meaning bearing patterns from the corpus. Much of the current content analysis software systems require a human coder to eyeball terms and sentiment words. Such an approach might yield very good quality results on small text collections but when confronted with a 40-50 million word corpus such an approach does not scale, and a large-scale computer-based approach is required. We report on the use of Grid computing technologies and techniques to cope with this analysis.


Standards for Language Codes: developing ISO 639
David Dalby | Lee Gillam | Christopher Cox | Debbie Garside
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)


Knowledge Exchange and Terminology Interchange: The Role of Standards
Lee Gillam | Khurshid Ahmad | David Dalby | Christopher Cox
Proceedings of Translating and the Computer 24