This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
Remcovan Veenendaal
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
In this paper we report on the Flemish-Dutch Agency for Human Language Technologies (HLT Agency or TST-Centrale in Dutch) in the Low Countries. We present its activities in its first decade of existence. The main goal of the HLT Agency is to ensure the sustainability of linguistic resources for Dutch. 10 years after its inception, the HLT Agency faces new challenges and opportunities. An important contextual factor is the rise of the infrastructure networks and proliferation of resource centres. We summarise some lessons learnt and we propose as future work to define and build for Dutch (which by extension can apply to any national language) a set of Basic LAnguage Infrastructure SErvices (BLAISE). As a conclusion, we state that the HLT Agency, also by its peculiar institutional status, has fulfilled and still is fulfilling an important role in maintaining Dutch as a digitally fully fledged functional language.
Currently, research infrastructures are being designed and established in many disciplines since they all suffer from an enormous fragmentation of their resources and tools. In the domain of language resources and tools the CLARIN initiative has been funded since 2008 to overcome many of the integration and interoperability hurdles. CLARIN can build on knowledge and work from many projects that were carried out during the last years and wants to build stable and robust services that can be used by researchers. Here service centres will play an important role that have the potential of being persistent and that adhere to criteria as they have been established by CLARIN. In the last year of the so-called preparatory phase these centres are currently developing four use cases that can demonstrate how the various pillars CLARIN has been working on can be integrated. All four use cases fulfil the criteria of being cross-national.
The DAM-LR project aims at virtually integrating various European language resource archives that allow users to navigate and operate in a single unified domain of language resources. This type of integration introduces Grid technology to the humanities disciplines and forms a federation of archives. The complete architecture is designed based on a few well-known components .This is considered the basis for building a research infrastructure for Language Resources as is planned within the CLARIN project. The DAM-LR project was purposefully started with only a small number of participants for flexibility and to avoid complex contract negotiations with respect to legal issues. Now that we have gained insights into the basic technology issues and organizational issues, it is foreseen that the federation will be expanded considerably within the CLARIN project that will also address the associated legal issues.
The Dutch HLT agency for language and speech technology (known as TST-centrale) at the Institute for Dutch Lexicology is responsible for the maintenance, distribution and accessibility of (Dutch) digital language resources. In this paper we present a project which aims to standardise the format of a set of bilingual lexicons in order to make them available to potential users, to facilitate the exchange of data (among the resources and with other (monolingual) resources) and to enable reuse of these lexicons for NLP applications like machine translation and multilingual information retrieval. We pay special attention to the methods and tools we used and to some of the problematic issues we encountered during the conversion process. As these problems are mainly caused by the fact that the standard LMF model fails in representing the detailed semantic and pragmatic distinctions made in our bilingual data, we propose some modifications to the standard. In general, we think that a standard for lexicons should provide a model for bilingual lexicons that is able to represent all detailed and fine-grained translation information which is generally found in these types of lexicons.
The TST Centre manages a broad collection of Dutch digital language resources. It is an initiative of the Dutch Language Union (Nederlandse Taalunie), and is meant to reinforce research in the area of language and speech technology. It does this by stimulating the reuse of these language resources. The TST Centre keeps these resources up to date, facilitates their availability, and offers services such as providing information, documentation, online access, offering catalogues, custom-made data, etc. Also, the TST Centre strives for a uniformised, if not standardised, treatment of language resources of the same nature. A well-thought, structured administration system is needed to manage the various language resources, their updates, derived products, IPR, user administration, etc. We will discuss the organisation, tasks and services of the TST Centre, and the language resources it maintains. Also, we will look into practical data management solutions, IPR issues, and our activities in standardisation and linking language resources.