2018
pdf
Signbank: Software to Support Web Based Dictionaries of Sign Language
Steve Cassidy
|
Onno Crasborn
|
Henri Nieminen
|
Wessel Stoop
|
Micha Hulsbosch
|
Susan Even
|
Erwin Komen
|
Trevor Johnston
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2017
pdf
Overview of the 2017 ALTA Shared Task: Correcting OCR Errors
Diego Mollá-Aliod
|
Steve Cassidy
Proceedings of the Australasian Language Technology Association Workshop 2017
2016
pdf
abs
Publishing the Trove Newspaper Corpus
Steve Cassidy
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
The Trove Newspaper Corpus is derived from the National Library of Australia’s digital archive of newspaper text. The corpus is a snapshot of the NLA collection taken in 2015 to be made available for language research as part of the Alveo Virtual Laboratory and contains 143 million articles dating from 1806 to 2007. This paper describes the work we have done to make this large corpus available as a research collection, facilitating access to individual documents and enabling large scale processing of the newspaper text in a cloud-based environment.
2015
pdf
Finding Names in Trove: Named Entity Recognition for Australian Historical Newspapers
Sunghwan Mac Kim
|
Steve Cassidy
Proceedings of the Australasian Language Technology Association Workshop 2015
2014
pdf
Alveo, a Human Communication Science Virtual Laboratory
Dominique Estival
|
Steve Cassidy
Proceedings of the Australasian Language Technology Association Workshop 2014
pdf
bib
Integrating UIMA with Alveo, a human communication science virtual laboratory
Dominique Estival
|
Steve Cassidy
|
Karin Verspoor
|
Andrew MacKinlay
|
Denis Burnham
Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for HLT
pdf
abs
AusTalk: an audio-visual corpus of Australian English
Dominique Estival
|
Steve Cassidy
|
Felicity Cox
|
Denis Burnham
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper describes the AusTalk corpus, which was designed and created through the Big ASC, a collaborative project with the two main goals of providing a standardised infrastructure for audio-visual recordings in Australia and of producing a large audio-visual corpus of Australian English, with 3 hours of AV recordings for 1000 speakers. We first present the overall project, then describe the corpus itself and its components, the strict data collection protocol with high levels of standardisation and automation, and the processes put in place for quality control. We also discuss the annotation phase of the project, along with its goals and challenges; a major contribution of the project has been to explore procedures for automating annotations and we present our solutions. We conclude with the current status of the corpus and with some examples of research already conducted with this new resource. AusTalk is one of the corpora included in the HCS vLab, which is briefly sketched in the conclusion.
pdf
abs
The Alveo Virtual Laboratory: A Web Based Repository API
Steve Cassidy
|
Dominique Estival
|
Timothy Jones
|
Denis Burnham
|
Jared Burghold
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
The Human Communication Science Virtual Laboratory (HCS vLab) is an eResearch project funded under the Australian Government NeCTAR program to build a platform for collaborative eResearch around data representing human communication and the tools that researchers use in their analysis. The human communication science field is broadly defined to encompass the study of language from various perspectives but also includes research on music and various other forms of human expression. This paper outlines the core architecture of the HCS vLab and in particular, highlights the web based API that provides access to data and tools to authenticated users.
2013
pdf
Interoperable Annotation in the Australian National Corpus
Steve Cassidy
Proceedings of the 9th Joint ISO - ACL SIGSEM Workshop on Interoperable Semantic Annotation
2012
pdf
abs
The Australian National Corpus: National Infrastructure for Language Resources
Steve Cassidy
|
Michael Haugh
|
Pam Peters
|
Mark Fallu
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The Australian National Corpus has been established in an effort to make currently scattered and relatively inaccessible data available to researchers through an online portal. In contrast to other national corpora, it is conceptualised as a linked collection of many existing and future language resources representing language use in Australia, unified through common technical standards. This approach allows us to bootstrap a significant collection and add value to existing resources by providing a unified, online tool-set to support research in a number of disciplines. This paper provides an outline of the technical platform being developed to support the corpus and a brief overview of some of the collections that form part of the initial version of the Australian National Corpus.
2009
pdf
Ingesting the Auslan Corpus into the DADA Annotation Store
Steve Cassidy
|
Trevor Johnston
Proceedings of the Third Linguistic Annotation Workshop (LAW III)
2007
pdf
Named Entity Recognition in Question Answering of Speech Data
Diego Mollá
|
Menno van Zaanen
|
Steve Cassidy
Proceedings of the Australasian Language Technology Workshop 2007
2005
pdf
Formal Grammars for Linguistic Treebank Queries
Mark Dras
|
Steve Cassidy
Proceedings of the Australasian Language Technology Workshop 2005
2002
pdf
XQuery as an Annotation Query Language: a Use Case Analysis
Steve Cassidy
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)