Jan Odijk

Also published as: J. Odijk


2024

This paper proposes a canonical form for Multiword Expressions (MWEs), in particular for the Dutch language. The canonical form can be enriched with all kinds of annotations that can be used to describe the properties of the MWE and its components. It also introduces the DUCAME (DUtch CAnonical Multiword Expressions) lexical resource with more than 11k MWEs in canonical form. DUCAME is used in MWE-Finder to automatically generate queries for searching for flexible MWEs in large text corpora.
This paper introduces and demonstrates MWE Finder, an application to search for flexible multiword expressions (MWEs) in Dutch text corpora, starting from an example. If the example is in canonical form, the application automatically generates three queries to search for sentences that contain an occurrence of the MWE and thus enables efficient analysis of its properties. Searching is done in treebanks, so the grammatical structure of the sentences is taken into account.

2022

2020

Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe’s specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI – including many opportunities, synergies but also misconceptions – has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions.

2018

2017

2016

I introduce CLARIAH in the Netherlands, which aims to contribute the Netherlands part of a Europe-wide humanities research infrastructure. I describe the digital turn in the humanities, the background and context of CLARIAH, both nationally and internationally, its relation to the CLARIN and DARIAH infrastructures, and the rationale for joining forces between CLARIN and DARIAH in the Netherlands. I also describe the first results of joining forces as achieved in the CLARIAH-SEED project, and the plans of the CLARIAH-CORE project, which is currently running

2014

This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiative’s work throughout Europe in order to boost progress and innovation in our field.
In this paper I provide a high level overview of the major results of CLARIN-NL so far. I will show that CLARIN-NL is starting to provide the data, facilities and services in the CLARIN infrastructure to carry out humanities research supported by large amounts of data and tools. These services have easy interfaces and easy search options (no technical background needed). Still some training is required, to understand both the possibilities and the limitations of the data and the tools. Actual use of the facilities leads to suggestions for improvements and to suggestions for new functionality. All researchers are therefore invited to start using the elements in the CLARIN infrastructure offered by CLARIN-NL. Though I will show that a lot has been achieved in the CLARIN-NL project, I will also provide a long list of functionality and interoperability cases that have not been dealt with in CLARIN-NL and must remain for future work.

2012

In this paper we describe recent developments in the CLARIN-NL project with the goal of sharing information on and experiences in this project with the community outside of the Netherlands. We discuss a variety of subprojects to actually implement the infrastructure, to provide functionality for search in metadata and the actual data, resource curation and demonstration projects, the Data Curation Service, actions to improve semantic interoperability and coordinate work on it, involvement of CLARIN Data Providers, education and training, outreach activities, and cooperation with other projects. Based on these experiences, we provide some recommendations for related projects. The recommendations concern a variety of topics including the organisation of an infrastructure project as a function of the types of tasks that have to be carried out, involvement of the targeted users, metadata, semantic interoperability and the role of registries, measures to maximally ensure sustainability, and cooperation with similar projects in other countries.
The FLaReNet Strategic Agenda highlights the most pressing needs for the sector of Language Resources and Technologies and presents a set of recommendations for its development and progress in Europe, as issued from a three-year consultation of the FLaReNet European project. The FLaReNet recommendations are organised around nine dimensions: a) documentation b) interoperability c) availability, sharing and distribution d) coverage, quality and adequacy e) sustainability f) recognition g) development h) infrastructure and i) international cooperation. As such, they cover a broad range of topics and activities, spanning over production and use of language resources, licensing, maintenance and preservation issues, infrastructures for language resources, resource identification and sharing, evaluation and validation, interoperability and policy issues. The intended recipients belong to a large set of players and stakeholders in Language Resources and Technology, ranging from individuals to research and education institutions, to policy-makers, funding agencies, SMEs and large companies, service and media providers. The main goal of these recommendations is to serve as an instrument to support stakeholders in planning for and addressing the urgencies of the Language Resources and Technologies of the future.

2011

2010

In this paper I present the CLARIN-NL project, the Dutch national project that aims to play a central role in the European CLARIN infrastructure, not only for the preparatory phase, but also for the implementation and exploitation phases. I argue that the way the CLARIN-NL project has been set-up can serve as an excellent example for other national CLARIN projects, for the following reasons: (1) it is a mix between a programme and a project; (2) it offers opportunities to seriously test standards and protocols currently proposed by CLARIN, thus providing evidence-based requirements and desiderata for the CLARIN infrastructure and ensuring compatibility of CLARIN with national data and tools; (3) it brings the intended users (humanities researchers) and the technology providers (infrastructure specialists and language and speech technology researchers) together in concrete cooperation projects, with a central role for the user’s research questions,, thus ensuring that the infrastructure will provide functionality that is needed by its intended users.

2008

2006

In 2004 a consortium of ministries and organizations in the Netherlands and Flanders launched the comprehensive Dutch-Flemish HLT programme STEVIN (a Dutch acronym for “Essential Speech and Language Technology Resources”). To guarantee its Dutch-Flemish character, this large-scale programme is carried out under the auspices of the intergovernmental Dutch Language Union (NTU). The aim of STEVIN is to contribute to the further progress of HLT for the Dutch language, by raising awareness of HLT results, stimulating the demand of HLT products, promoting strategic research in HLT, and developing HLT resources that are essential and are known to be missing. Furthermore, a structure was set up for the management, maintenance and distribution of HLT resources. The STEVIN programme, which will run from 2004 to 2009, resulted from HLT activities in the Dutch language area, which were reported on at previous LREC conferences (2000, 2002, 2004). In this paper we will explain how different activities are combined in one comprehensive programme. We will show how cooperation can successfully be realized between different parties (language and speech technology, Flanders and the Netherlands, academia, industry and policy institutions) so as to achieve one common goal: progress in HLT.
The goal of this paper is (1) to illustrate a specific procedure for merging different monolingual lexicons, focussing on techniques for detecting and mapping equivalent lexical entries, and (2) to sketch a production model that enables one to obtain lexical resources via unification of existing data. We describe the creation of a Unified Lexicon (UL) from a common sample of the Italian PAROLE-SIMPLE-CLIPS phonological lexicon and of the Italian LCSTAR pronunciation lexicon. We expand previous experiments carried out at ILC-CNR: based on a detailed mechanism for mapping grammatical classifications of candidate UL entries, a consensual set of Unified Morphosyntactic Specifications (UMS) shared by lexica for the written and spoken areas is proposed. The impact of the UL on cross-validation issues is analysed: by looking into conflicts, mismatches and diverging classifications can be detected in both resources. The work presented is in line with the activities promoted by ELRA towards the development of methods for packaging new language resources by combining independently created resources, and was carried out as part of the ELRA Production Committee activities. ELRA aims to exploit the UL experience to carry out such merging activities for resources available on the ELRA catalogue in order to fulfill the users' needs.

2004

1997

1989

Search
Co-authors
Fix author