Nederlab: Towards a Single Portal and Research Environment for Diachronic Dutch Text Corpora
Hennie Brugman
Martin Reynaert
Nicoline van der Sijs
René van Stipriaan
Erik Tjong Kim Sang
Antal van den Bosch
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
The Nederlab project aims to bring together all digitized texts relevant to the Dutch national heritage, the history of the Dutch language and culture (circa 800 – present) in one user friendly and tool enriched open access web interface. This paper describes Nederlab halfway through the project period and discusses the collections incorporated, back-office processes, system back-end as well as the Nederlab Research Portal end-user web application.
Relevance of ASR for the Automatic Generation of Keywords Suggestions for TV programs
Véronique Malaisé
Luit Gazendam
Willemijn Heeren
Roeland Ordelman
Hennie Brugman
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts
Semantic access to multimedia content in audiovisual archives is to a large extent dependent on quantity and quality of the metadata, and particularly the content descriptions that are attached to the individual items. However, the manual annotation of collections puts heavy demands on resources. A large number of archives are introducing (semi) automatic annotation techniques for generating and/or enhancing metadata. The NWO funded CATCH-CHOICE project has investigated the extraction of keywords from textual resources related to TV programs to be archived (context documents), in collaboration with the Dutch audiovisual archives, Sound and Vision. This paper investigates the suitability of Automatic Speech Recognition transcripts produced in the CATCH-CHoral project for generating such keywords, which we evaluate against manual annotations of the documents, and against keywords automatically generated from context documents describing the TV programs’ content.
A Common Multimedia Annotation Framework for Cross Linking Cultural Heritage Digital Collections
Hennie Brugman
Véronique Malaisé
Laura Hollink
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
In the context of the CATCH research program that is currently carried out at a number of large Dutch cultural heritage institutions our ambition is to combine and exchange heterogeneous multimedia annotations between projects and institutions. As first step we designed an Annotation Meta Model: a simple but powerful RDF/OWL model mainly addressing the anchoring of annotations to segments of the many different media types used in the collections of the archives, museums and libraries involved. The model includes support for the annotation of annotations themselves, and of segments of annotation values, to be able to layer annotations and in this way enable projects to process each others annotation data as the primary data for further annotation. On basis of AMM we designed an application programming interface for accessing annotation repositories and implemented it both as a software library and as a web service. Finally, we report on our experiences with the application of model, API and repository when developing web applications for collection managers in cultural heritage institutions.
Disambiguating automatic semantic annotation based on a thesaurus structure
Véronique Malaisé
Luit Gazendam
Hennie Brugman
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs
The use/use for relationship a thesaurus is usually more complex than the (para-) synonymy recommended in the ISO-2788 standard describing the content of these controlled vocabularies. The fact that a non preferred term can refer to multiple preferred terms (only the latter are relevant in controlled indexing) makes this relationship difficult to use in automatic annotation applications : it generates ambiguity cases. In this paper, we present the CARROT algorithm, meant to rank the output of our Information Extraction pipeline, and how this algorithm can be used to select the relevant preferred term out of different possibilities. This selection is meant to provide suggestions of keywords to human annotators, in order to ease and speed up their daily process and is based on the structure of their thesaurus. We achieve a 95 % success, and discuss these results along with perspectives for this experiment.
Anchoring Dutch Cultural Heritage Thesauri to WordNet: Two Case Studies
Véronique Malaisé
Antoine Isaac
Luit Gazendam
Hennie Brugman
Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007).
ELAN: a Professional Framework for Multimodality Research
Peter Wittenburg
Hennie Brugman
Albert Russel
Alex Klassmann
Han Sloetjes
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Utilization of computer tools in linguistic research has gained importance with the maturation of media frameworks for the handling of digital audio and video. The increased use of these tools in gesture, sign language and multimodal interaction studies has led to stronger requirements on the flexibility, the efficiency and in particular the time accuracy of annotation tools. This paper describes the efforts made to make ELAN a tool that meets these requirements, with special attention to the developments in the area of time accuracy. In subsequent sections an overview will be given of other enhancements in the latest versions of ELAN that makes it a useful tool in multimodality research.
A Web Based General Thesaurus Browser to Support Indexing of Television and Radio Programs
Hennie Brugman
Véronique Malaisé
Luit Gazendam
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Documentation and retrieval processes at the Netherlands Institute for Sound and Vision are organized around a common thesaurus. To help improve the quality of these processes the thesaurus was transformed into a RDF/OWL ontology and extended on basis of implicit information and external resources. A thesaurus browser web application was designed, implemented and tested on future users.
Architecture for Distributed Language Resource Management and Archiving
Peter Wittenburg
Heidi Johnson
Markus Buchhorn
Hennie Brugman
Daan Broeder
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
An architecture is presented that provides an integrated framework for managing, archiving and accessing language resources. This architecture was discussed in the DELAMAN network – a world-wide network of archives holding material about endangered languages. Such a framework will be built upon a metadata infrastructure, a mechanism to resolve unique resource identifiers, user and access rights management components. These components are closely related and have to be based on redundant and distributed services. For all these components existing middleware seems to be available, however, it has to be checked how they can interact with each other.
Collaborative Annotation of Sign Language Data with Peer-to-Peer Technology
Hennie Brugman
Onno Crasborn
Albert Russel
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
Web Services Architecture for Language Resources
Angelo Dalli
Valentin Tablan
Kalina Bontcheva
Yorick Wilks
Daan Broeder
Hennie Brugman
Peter Wittenburg
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
Annotating Multi-media/Multi-modal Resources with ELAN
Hennie Brugman
Albert Russel
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
Multimedia Annotation with Multilingual Input Methods and Search Support
Hennie Brugman
Harriet Spenke
Markus Kramer
Alexander Klassmann
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)
Multimodal Annotations in Gesture and Sign Language Studies
P. Wittenburg
St. Levinson
S. Kita
H. Brugman
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)
The EUDICO Project, Multi Media Annotation over the Internet
Albert Russel
Hennie Brugman
Daan Broeder
Peter Wittenburg
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)
Towards a Standard for Meta-descriptions of Language Resources
D. Broeder
H. Brugman
A. Russel
R. Skiba
P. Wittenburg
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)
An Experiment in Unifying Audio-Visual and Textual Infrastructures for Language Processing Research and Development
Kalina Bontcheva
Hennie Brugman
Hamish Cunningham
Albert Russel
Peter Wittenburg
Proceedings of the COLING-2000 Workshop on Using Toolsets and Architectures To Build NLP Systems