Remco van Veenendaal


2014

pdf
A decade of HLT Agency activities in the Low Countries: from resource maintenance (BLARK) to service offerings (BLAISE)
Peter Spyns | Remco van Veenendaal
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we report on the Flemish-Dutch Agency for Human Language Technologies (HLT Agency or TST-Centrale in Dutch) in the Low Countries. We present its activities in its first decade of existence. The main goal of the HLT Agency is to ensure the sustainability of linguistic resources for Dutch. 10 years after its inception, the HLT Agency faces new challenges and opportunities. An important contextual factor is the rise of the infrastructure networks and proliferation of resource centres. We summarise some lessons learnt and we propose as future work to define and build for Dutch (which by extension can apply to any national language) a set of Basic LAnguage Infrastructure SErvices (BLAISE). As a conclusion, we state that the HLT Agency, also by its peculiar institutional status, has fulfilled and still is fulfilling an important role in maintaining Dutch as a digitally fully fledged functional language.

2010

pdf
Resource and Service Centres as the Backbone for a Sustainable Service Infrastructure
Peter Wittenburg | Nuria Bel | Lars Borin | Gerhard Budin | Nicoletta Calzolari | Eva Hajicova | Kimmo Koskenniemi | Lothar Lemnitzer | Bente Maegaard | Maciej Piasecki | Jean-Marie Pierrel | Stelios Piperidis | Inguna Skadina | Dan Tufis | Remco van Veenendaal | Tamas Váradi | Martin Wynne
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Currently, research infrastructures are being designed and established in many disciplines since they all suffer from an enormous fragmentation of their resources and tools. In the domain of language resources and tools the CLARIN initiative has been funded since 2008 to overcome many of the integration and interoperability hurdles. CLARIN can build on knowledge and work from many projects that were carried out during the last years and wants to build stable and robust services that can be used by researchers. Here service centres will play an important role that have the potential of being persistent and that adhere to criteria as they have been established by CLARIN. In the last year of the so-called preparatory phase these centres are currently developing four use cases that can demonstrate how the various pillars CLARIN has been working on can be integrated. All four use cases fulfil the criteria of being cross-national.

2008

pdf
Building a Federation of Language Resource Repositories: the DAM-LR Project and its Continuation within CLARIN.
Daan Broeder | David Nathan | Sven Strömqvist | Remco van Veenendaal
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The DAM-LR project aims at virtually integrating various European language resource archives that allow users to navigate and operate in a single unified domain of language resources. This type of integration introduces Grid technology to the humanities disciplines and forms a federation of archives. The complete architecture is designed based on a few well-known components .This is considered the basis for building a research infrastructure for Language Resources as is planned within the CLARIN project. The DAM-LR project was purposefully started with only a small number of participants for flexibility and to avoid complex contract negotiations with respect to legal issues. Now that we have gained insights into the basic technology issues and organizational issues, it is foreseen that the federation will be expanded considerably within the CLARIN project that will also address the associated legal issues.

pdf
Standardising Bilingual Lexical Resources According to the Lexicon Markup Framework
Isa Maks | Carole Tiberius | Remco van Veenendaal
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The Dutch HLT agency for language and speech technology (known as TST-centrale) at the Institute for Dutch Lexicology is responsible for the maintenance, distribution and accessibility of (Dutch) digital language resources. In this paper we present a project which aims to standardise the format of a set of bilingual lexicons in order to make them available to potential users, to facilitate the exchange of data (among the resources and with other (monolingual) resources) and to enable reuse of these lexicons for NLP applications like machine translation and multilingual information retrieval. We pay special attention to the methods and tools we used and to some of the problematic issues we encountered during the conversion process. As these problems are mainly caused by the fact that the standard LMF model fails in representing the detailed semantic and pragmatic distinctions made in our bilingual data, we propose some modifications to the standard. In general, we think that a standard for lexicons should provide a model for bilingual lexicons that is able to represent all detailed and fine-grained translation information which is generally found in these types of lexicons.

2006

pdf
Functioning of the Centre for Dutch Language and Speech Technology
Michel Boekestein | Griet Depoorter | Remco van Veenendaal
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The TST Centre manages a broad collection of Dutch digital language resources. It is an initiative of the Dutch Language Union (Nederlandse Taalunie), and is meant to reinforce research in the area of language and speech technology. It does this by stimulating the reuse of these language resources. The TST Centre keeps these resources up to date, facilitates their availability, and offers services such as providing information, documentation, online access, offering catalogues, custom-made data, etc. Also, the TST Centre strives for a uniformised, if not standardised, treatment of language resources of the same nature. A well-thought, structured administration system is needed to manage the various language resources, their updates, derived products, IPR, user administration, etc. We will discuss the organisation, tasks and services of the TST Centre, and the language resources it maintains. Also, we will look into practical data management solutions, IPR issues, and our activities in standardisation and linking language resources.