Hennie Brugman

Also published as: H. Brugman

2016

Nederlab: Towards a Single Portal and Research Environment for Diachronic Dutch Text Corpora
Hennie Brugman | Martin Reynaert | Nicoline van der Sijs | René van Stipriaan | Erik Tjong Kim Sang | Antal van den Bosch
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The Nederlab project aims to bring together all digitized texts relevant to the Dutch national heritage, the history of the Dutch language and culture (circa 800 – present) in one user friendly and tool enriched open access web interface. This paper describes Nederlab halfway through the project period and discusses the collections incorporated, back-office processes, system back-end as well as the Nederlab Research Portal end-user web application.

2009

pdf bib abs

Relevance of ASR for the Automatic Generation of Keywords Suggestions for TV programs
Véronique Malaisé | Luit Gazendam | Willemijn Heeren | Roeland Ordelman | Hennie Brugman
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Semantic access to multimedia content in audiovisual archives is to a large extent dependent on quantity and quality of the metadata, and particularly the content descriptions that are attached to the individual items. However, the manual annotation of collections puts heavy demands on resources. A large number of archives are introducing (semi) automatic annotation techniques for generating and/or enhancing metadata. The NWO funded CATCH-CHOICE project has investigated the extraction of keywords from textual resources related to TV programs to be archived (context documents), in collaboration with the Dutch audiovisual archives, Sound and Vision. This paper investigates the suitability of Automatic Speech Recognition transcripts produced in the CATCH-CHoral project for generating such keywords, which we evaluate against manual annotations of the documents, and against keywords automatically generated from context documents describing the TV programs’ content.

2008

pdf bib abs

A Common Multimedia Annotation Framework for Cross Linking Cultural Heritage Digital Collections
Hennie Brugman | Véronique Malaisé | Laura Hollink
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In the context of the CATCH research program that is currently carried out at a number of large Dutch cultural heritage institutions our ambition is to combine and exchange heterogeneous multimedia annotations between projects and institutions. As first step we designed an Annotation Meta Model: a simple but powerful RDF/OWL model mainly addressing the anchoring of annotations to segments of the many different media types used in the collections of the archives, museums and libraries involved. The model includes support for the annotation of annotations themselves, and of segments of annotation values, to be able to layer annotations and in this way enable projects to process each others annotation data as the primary data for further annotation. On basis of AMM we designed an application programming interface for accessing annotation repositories and implemented it both as a software library and as a web service. Finally, we report on our experiences with the application of model, API and repository when developing web applications for collection managers in cultural heritage institutions.

2007

pdf bib abs

Disambiguating automatic semantic annotation based on a thesaurus structure
Véronique Malaisé | Luit Gazendam | Hennie Brugman
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

The use/use for relationship a thesaurus is usually more complex than the (para-) synonymy recommended in the ISO-2788 standard describing the content of these controlled vocabularies. The fact that a non preferred term can refer to multiple preferred terms (only the latter are relevant in controlled indexing) makes this relationship difficult to use in automatic annotation applications : it generates ambiguity cases. In this paper, we present the CARROT algorithm, meant to rank the output of our Information Extraction pipeline, and how this algorithm can be used to select the relevant preferred term out of different possibilities. This selection is meant to provide suggestions of keywords to human annotators, in order to ease and speed up their daily process and is based on the structure of their thesaurus. We achieve a 95 % success, and discuss these results along with perspectives for this experiment.

pdf bib

Anchoring Dutch Cultural Heritage Thesauri to WordNet: Two Case Studies
Véronique Malaisé | Antoine Isaac | Luit Gazendam | Hennie Brugman
Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007).

2006

pdf bib abs

ELAN: a Professional Framework for Multimodality Research
Peter Wittenburg | Hennie Brugman | Albert Russel | Alex Klassmann | Han Sloetjes
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Utilization of computer tools in linguistic research has gained importance with the maturation of media frameworks for the handling of digital audio and video. The increased use of these tools in gesture, sign language and multimodal interaction studies has led to stronger requirements on the flexibility, the efficiency and in particular the time accuracy of annotation tools. This paper describes the efforts made to make ELAN a tool that meets these requirements, with special attention to the developments in the area of time accuracy. In subsequent sections an overview will be given of other enhancements in the latest versions of ELAN that makes it a useful tool in multimodality research.

pdf bib abs

A Web Based General Thesaurus Browser to Support Indexing of Television and Radio Programs
Hennie Brugman | Véronique Malaisé | Luit Gazendam
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Documentation and retrieval processes at the Netherlands Institute for Sound and Vision are organized around a common thesaurus. To help improve the quality of these processes the thesaurus was transformed into a RDF/OWL ontology and extended on basis of implicit information and external resources. A thesaurus browser web application was designed, implemented and tested on future users.

2004

pdf bib abs

Architecture for Distributed Language Resource Management and Archiving
Peter Wittenburg | Heidi Johnson | Markus Buchhorn | Hennie Brugman | Daan Broeder
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

An architecture is presented that provides an integrated framework for managing, archiving and accessing language resources. This architecture was discussed in the DELAMAN network – a world-wide network of archives holding material about endangered languages. Such a framework will be built upon a metadata infrastructure, a mechanism to resolve unique resource identifiers, user and access rights management components. These components are closely related and have to be based on redundant and distributed services. For all these components existing middleware seems to be available, however, it has to be checked how they can interact with each other.

pdf bib

Collaborative Annotation of Sign Language Data with Peer-to-Peer Technology
Hennie Brugman | Onno Crasborn | Albert Russel
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib