Thierry Declerck


2021

pdf bib
Towards the Addition of Pronunciation Information to Lexical Semantic Resources
Thierry Declerck | Lenka Bajčetić
Proceedings of the 11th Global Wordnet Conference

This paper describes ongoing work aiming at adding pronunciation information to lexical semantic resources, with a focus on open wordnets. Our goal is not only to add a new modality to those semantic networks, but also to mark heteronyms listed in them with the pronunciation information associated with their different meanings. This work could contribute in the longer term to the disambiguation of multi-modal resources, which are combining text and speech.

pdf bib
Proceedings of the 6th Workshop on Semantic Deep Learning (SemDeep-6)
Luis Espinosa-Anke | Dagmar Gromann | Thierry Declerck | Anna Breit | Jose Camacho-Collados | Mohammad Taher Pilehvar | Artem Revenko
Proceedings of the 6th Workshop on Semantic Deep Learning (SemDeep-6)

pdf bib
Embeddings for the Lexicon: Modelling and Representation
Christian Chiarcos | Thierry Declerck | Maxim Ionov
Proceedings of the 6th Workshop on Semantic Deep Learning (SemDeep-6)

2020

pdf bib
Modelling Frequency and Attestations for OntoLex-Lemon
Christian Chiarcos | Maxim Ionov | Jesse de Does | Katrien Depuydt | Anas Fahad Khan | Sander Stolk | Thierry Declerck | John Philip McCrae
Proceedings of the 2020 Globalex Workshop on Linked Lexicography

The OntoLex vocabulary enjoys increasing popularity as a means of publishing lexical resources with RDF and as Linked Data. The recent publication of a new OntoLex module for lexicography, lexicog, reflects its increasing importance for digital lexicography. However, not all aspects of digital lexicography have been covered to the same extent. In particular, supplementary information drawn from corpora such as frequency information, links to attestations, and collocation data were considered to be beyond the scope of lexicog. Therefore, the OntoLex community has put forward the proposal for a novel module for frequency, attestation and corpus information (FrAC), that not only covers the requirements of digital lexicography, but also accommodates essential data structures for lexical information in natural language processing. This paper introduces the current state of the OntoLex-FrAC vocabulary, describes its structure, some selected use cases, elementary concepts and fundamental definitions, with a focus on frequency and attestations.

pdf bib
Towards an Extension of the Linking of the Open Dutch WordNet with Dutch Lexicographic Resources
Thierry Declerck
Proceedings of the 2020 Globalex Workshop on Linked Lexicography

This extended abstract presents on-going work consisting in interlinking and merging the Open Dutch WordNet and generic lexicographic resources for Dutch, focusing for now on the Dutch and English versions of Wiktionary and using the Algemeen Nederlands Woordenboek as a quality checking instance. As the Open Dutch WordNet is already equipped with a relevant number of complex lexical units, we are aiming at expanding it and proposing a new representational framework for the encoding of the interlinked and integrated data. The longer term goal of the work is to investigate if and on how senses can be restricted to particular morphological variations of Dutch lexical entries, and how to represent this information in a Linguistic Linked Open Data compliant format.

pdf bib
Adding Pronunciation Information to Wordnets
Thierry Declerck | Lenka Bajcetic | Melanie Siegel
Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMW2020)

We describe on-going work consisting in adding pronunciation information to wordnets, as such information can indicate specific senses of a word. Many wordnets associate with their senses only a lemma form and a part-of-speech tag. At the same time, we are aware that additional linguistic information can be useful for identifying a specific sense of a wordnet lemma when encountered in a corpus. While work already deals with the addition of grammatical number or grammatical gender information to wordnet lemmas,we are investigating the linking of wordnet lemmas to pronunciation information, adding thus a speech-related modality to wordnets

pdf bib
On the Linguistic Linked Open Data Infrastructure
Christian Chiarcos | Bettina Klimek | Christian Fäth | Thierry Declerck | John Philip McCrae
Proceedings of the 1st International Workshop on Language Technology Platforms

In this paper we describe the current state of development of the Linguistic Linked Open Data (LLOD) infrastructure, an LOD(sub-)cloud of linguistic resources, which covers various linguistic data bases, lexicons, corpora, terminology and metadata repositories.We give in some details an overview of the contributions made by the European H2020 projects “Prêt-à-LLOD” (‘Ready-to-useMultilingual Linked Language Data for Knowledge Services across Sectors’) and “ELEXIS” (‘European Lexicographic Infrastructure’) to the further development of the LLOD.

pdf bib
Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020)
Maxim Ionov | John P. McCrae | Christian Chiarcos | Thierry Declerck | Julia Bosque-Gil | Jorge Gracia
Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020)

pdf bib
Proceedings of the 12th Language Resources and Evaluation Conference
Nicoletta Calzolari | Frédéric Béchet | Philippe Blache | Khalid Choukri | Christopher Cieri | Thierry Declerck | Sara Goggi | Hitoshi Isahara | Bente Maegaard | Joseph Mariani | Hélène Mazo | Asuncion Moreno | Jan Odijk | Stelios Piperidis
Proceedings of the 12th Language Resources and Evaluation Conference

pdf bib
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi | John Philip McCrae | Sanni Nimb | Fahad Khan | Monica Monachini | Bolette Pedersen | Thierry Declerck | Tanja Wissik | Andrea Bellandi | Irene Pisani | Thomas Troelsgård | Sussi Olsen | Simon Krek | Veronika Lipp | Tamás Váradi | László Simon | András Gyorffy | Carole Tiberius | Tanneke Schoonheim | Yifat Ben Moshe | Maya Rudich | Raya Abu Ahmad | Dorielle Lonke | Kira Kovalenko | Margit Langemets | Jelena Kallas | Oksana Dereza | Theodorus Fransen | David Cillessen | David Lindemann | Mikel Alonso | Ana Salgado | José Luis Sancho | Rafael-J. Ureña-Ruiz | Jordi Porta Zamorano | Kiril Simov | Petya Osenova | Zara Kancheva | Ivaylo Radev | Ranka Stanković | Andrej Perdih | Dejan Gabrovsek
Proceedings of the 12th Language Resources and Evaluation Conference

Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages and resources and focuses on the more challenging task of linking general-purpose language. We believe that our data will pave the way for further advances in alignment and evaluation of word senses by creating new solutions, particularly those notoriously requiring data such as neural networks. Our resources are publicly available at https://github.com/elexis-eu/MWSA.

pdf bib
Language Data Sharing in European Public Services – Overcoming Obstacles and Creating Sustainable Data Sharing Infrastructures
Lilli Smal | Andrea Lösch | Josef van Genabith | Maria Giagkou | Thierry Declerck | Stephan Busemann
Proceedings of the 12th Language Resources and Evaluation Conference

Data is key in training modern language technologies. In this paper, we summarise the findings of the first pan-European study on obstacles to sharing language data across 29 EU Member States and CEF-affiliated countries carried out under the ELRC White Paper action on Sustainable Language Data Sharing to Support Language Equality in Multilingual Europe. Why Language Data Matters. We present the methodology of the study, the obstacles identified and report on recommendations on how to overcome those. The obstacles are classified into (1) lack of appreciation of the value of language data, (2) structural challenges, (3) disposition towards CAT tools and lack of digital skills, (4) inadequate language data management practices, (5) limited access to outsourced translations, and (6) legal concerns. Recommendations are grouped into addressing the European/national policy level, and the organisational/institutional level.

pdf bib
Recent Developments for the Linguistic Linked Open Data Infrastructure
Thierry Declerck | John Philip McCrae | Matthias Hartung | Jorge Gracia | Christian Chiarcos | Elena Montiel-Ponsoda | Philipp Cimiano | Artem Revenko | Roser Saurí | Deirdre Lee | Stefania Racioppa | Jamal Abdul Nasir | Matthias Orlikowsk | Marta Lanau-Coronas | Christian Fäth | Mariano Rico | Mohammad Fazleh Elahi | Maria Khvalchik | Meritxell Gonzalez | Katharine Cooney
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper we describe the contributions made by the European H2020 project “Prêt-à-LLOD” (‘Ready-to-use Multilingual Linked Language Data for Knowledge Services across Sectors’) to the further development of the Linguistic Linked Open Data (LLOD) infrastructure. Prêt-à-LLOD aims to develop a new methodology for building data value chains applicable to a wide range of sectors and applications and based around language resources and language technologies that can be integrated by means of semantic technologies. We describe the methods implemented for increasing the number of language data sets in the LLOD. We also present the approach for ensuring interoperability and for porting LLOD data sets and services to other infrastructures, as well as the contribution of the projects to existing standards.

2019

pdf bib
OntoLex as a possible Bridge between WordNets and full lexical Descriptions
Thierry Declerck | Melanie Siegel
Proceedings of the 10th Global Wordnet Conference

In this paper we describe our current work on representing a recently created German lexical semantics resource in OntoLex-Lemon and in conformance with WordNet specifications. Besides presenting the representation effort, we show the utilization of OntoLex-Lemon to bridge from WordNet-like resources to full lexical descriptions and extend the coverage of WordNets to other types of lexical data, such as decomposition results, exemplified for German data, and inflectional phenomena, here outlined for English data.

pdf bib
Using OntoLex-Lemon for Representing and Interlinking German Multiword Expressions in OdeNet and MMORPH
Thierry Declerck | Melanie Siegel | Stefania Racioppa
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

We describe work consisting in porting two large German lexical resources into the OntoLex-Lemon model in order to establish complementary interlinkings between them. One resource is OdeNet (Open German WordNet) and the other is a further development of the German version of the MMORPH morphological analyzer. We show how the Multiword Expressions (MWEs) contained in OdeNet can be morphologically specified by the use of the lexical representation and linking features of OntoLex-Lemon, which also support the formulation of restrictions in the usage of such expressions.

pdf bib
Proceedings of the 5th Workshop on Semantic Deep Learning (SemDeep-5)
Luis Espinosa-Anke | Thierry Declerck | Dagmar Gromann | Jose Camacho-Collados | Mohammad Taher Pilehvar
Proceedings of the 5th Workshop on Semantic Deep Learning (SemDeep-5)

pdf bib
Porting Multilingual Morphological Resources to OntoLex-Lemon
Thierry Declerck | Stefania Racioppa
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

We describe work consisting in porting various morphological resources to the OntoLex-Lemon model. A main objective of this work is to offer a uniform representation of different morphological data sets in order to be able to compare and interlink multilingual resources and to cross-check and interlink or merge the content of morphological resources of one and the same language. The results of our work will be published on the Linguistic Linked Open Data cloud.

2018

pdf bib
Proceedings of the Third Workshop on Semantic Deep Learning
Luis Espinosa Anke | Dagmar Gromann | Thierry Declerck
Proceedings of the Third Workshop on Semantic Deep Learning

bib
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Nicoletta Calzolari | Khalid Choukri | Christopher Cieri | Thierry Declerck | Sara Goggi | Koiti Hasida | Hitoshi Isahara | Bente Maegaard | Joseph Mariani | Hélène Mazo | Asuncion Moreno | Jan Odijk | Stelios Piperidis | Takenobu Tokunaga
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Comparing Pretrained Multilingual Word Embeddings on an Ontology Alignment Task
Dagmar Gromann | Thierry Declerck
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
An Integrated Formal Representation for Terminological and Lexical Data included in Classification Schemes
Thierry Declerck | Kseniya Egorova | Eileen Schnur
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
European Language Resource Coordination: Collecting Language Resources for Public Sector Multilingual Information Management
Andrea Lösch | Valérie Mapelli | Stelios Piperidis | Andrejs Vasiļjevs | Lilli Smal | Thierry Declerck | Eileen Schnur | Khalid Choukri | Josef van Genabith
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Hashtag Processing for Enhanced Clustering of Tweets
Dagmar Gromann | Thierry Declerck
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Rich data provided by tweets have beenanalyzed, clustered, and explored in a variety of studies. Typically those studies focus on named entity recognition, entity linking, and entity disambiguation or clustering. Tweets and hashtags are generally analyzed on sentential or word level but not on a compositional level of concatenated words. We propose an approach for a closer analysis of compounds in hashtags, and in the long run also of other types of text sequences in tweets, in order to enhance the clustering of such text documents. Hashtags have been used before as primary topic indicators to cluster tweets, however, their segmentation and its effect on clustering results have not been investigated to the best of our knowledge. Our results with a standard dataset from the Text REtrieval Conference (TREC) show that segmented and harmonized hashtags positively impact effective clustering.

pdf bib
Proceedings of the 2nd Workshop on Semantic Deep Learning (SemDeep-2)
Dagmar Gromann | Thierry Declerck | Georg Heigl
Proceedings of the 2nd Workshop on Semantic Deep Learning (SemDeep-2)

pdf bib
Multilingual Ontologies for the Representation and Processing of Folktales
Thierry Declerck | Anastasija Aman | Martin Banzer | Dominik Macháček | Lisa Schäfer | Natalia Skachkova
Proceedings of the First Workshop on Language technology for Digital Humanities in Central and (South-)Eastern Europe

We describe work done in the field of folkloristics and consisting in creating ontologies based on well-established studies proposed by “classical” folklorists. This work is supporting the availability of a huge amount of digital and structured knowledge on folktales to digital humanists. The ontological encoding of past and current motif-indexation and classification systems for folktales was in the first step limited to English language data. This led us to focus on making those newly generated formal knowledge sources available in a few more languages, like German, Russian and Bulgarian. We stress the importance of achieving this multilingual extension of our ontologies at a larger scale, in order for example to support the automated analysis and classification of such narratives in a large variety of languages, as those are getting more and more accessible on the Web.

2016

pdf bib
Towards a Formal Representation of Components of German Compounds
Thierry Declerck | Piroska Lendvai
Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

pdf bib
Towards a WordNet based Classification of Actors in Folktales
Thierry Declerck | Tyler Klement | Antonia Kostova
Proceedings of the 8th Global WordNet Conference (GWC)

In the context of a student software project we are investigating the use of WordNet for improving the automatic detection and classification of actors (or characters) mentioned in folktales. Our starting point is the book “Classification of International Folktales”, out of which we extract text segments that name the different actors involved in tales, taking advantage of patterns used by its author, Hans-Jo ̈rg Uther. We apply on those text segments functions that are implemented in the NLTK interface to WordNet in order to obtain lexical semantic information to enrich the original naming of characters proposed in the “Classification of International Folktales” and to support their translation in other languages.

bib
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Nicoletta Calzolari | Khalid Choukri | Thierry Declerck | Sara Goggi | Marko Grobelnik | Bente Maegaard | Joseph Mariani | Helene Mazo | Asuncion Moreno | Jan Odijk | Stelios Piperidis
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

pdf bib
The Open Linguistics Working Group: Developing the Linguistic Linked Open Data Cloud
John Philip McCrae | Christian Chiarcos | Francis Bond | Philipp Cimiano | Thierry Declerck | Gerard de Melo | Jorge Gracia | Sebastian Hellmann | Bettina Klimek | Steven Moran | Petya Osenova | Antonio Pareja-Lora | Jonathan Pool
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The Open Linguistics Working Group (OWLG) brings together researchers from various fields of linguistics, natural language processing, and information technology to present and discuss principles, case studies, and best practices for representing, publishing and linking linguistic data collections. A major outcome of our work is the Linguistic Linked Open Data (LLOD) cloud, an LOD (sub-)cloud of linguistic resources, which covers various linguistic databases, lexicons, corpora, terminologies, and metadata repositories. We present and summarize five years of progress on the development of the cloud and of advancements in open data in linguistics, and we describe recent community activities. The paper aims to serve as a guideline to orient and involve researchers with the community and/or Linguistic Linked Open Data.

pdf bib
Monolingual Social Media Datasets for Detecting Contradiction and Entailment
Piroska Lendvai | Isabelle Augenstein | Kalina Bontcheva | Thierry Declerck
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Entailment recognition approaches are useful for application domains such as information extraction, question answering or summarisation, for which evidence from multiple sentences needs to be combined. We report on a new 3-way judgement Recognizing Textual Entailment (RTE) resource that originates in the Social Media domain, and explain our semi-automatic creation method for the special purpose of information verification, which draws on manually established rumourous claims reported during crisis events. From about 500 English tweets related to 70 unique claims we compile and evaluate 5.4k RTE pairs, while continue automatizing the workflow to generate similar-sized datasets in other languages.

2015

pdf bib
Towards the Representation of Hashtags in Linguistic Linked Open Data Format
Thierry Declerck | Piroska Lendvai
Proceedings of the Second Workshop on Natural Language Processing and Linked Open Data

pdf bib
Processing and Normalizing Hashtags
Thierry Declerck | Piroska Lendvai
Proceedings of the International Conference Recent Advances in Natural Language Processing

2014

bib
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Nicoletta Calzolari | Khalid Choukri | Thierry Declerck | Hrafn Loftsson | Bente Maegaard | Joseph Mariani | Asuncion Moreno | Jan Odijk | Stelios Piperidis
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

pdf bib
TMO — The Federated Ontology of the TrendMiner Project
Hans-Ulrich Krieger | Thierry Declerck
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper describes work carried out in the European project TrendMiner which partly deals with the extraction and representation of real time information from dynamic data streams. The focus of this paper lies on the construction of an integrated ontology, TMO, the TrendMiner Ontology, that has been assembled from several independent multilingual taxonomies and ontologies which are brought together by an interface specification, expressed in OWL. Within TrendMiner, TMO serves as a common language that helps to interlink data, delivered from both symbolic and statistical components of the TrendMiner system. Very often, the extracted data is supplied as quintuples, RDF triples that are extended by two further temporal arguments, expressing the temporal extent in which an atemporal statement is true. In this paper, we will also sneak a peek on the temporal entailment rules and queries that are built into the semantic repository hosting the data and which can be used to derive useful new information.

pdf bib
A SKOS-based Schema for TEI encoded Dictionaries at ICLTT
Thierry Declerck | Karlheinz Mörth | Eveline Wandl-Vogt
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

At our institutes we are working with quite some dictionaries and lexical resources in the field of less-resourced language data, like dialects and historical languages. We are aiming at publishing those lexical data in the Linked Open Data framework in order to link them with available data sets for highly-resourced languages and elevating them thus to the same “digital dignity” the mainstream languages have gained. In this paper we concentrate on two TEI encoded variants of the Arabic language and propose a mapping of this TEI encoded data onto SKOS, showing how the lexical entries of the two dialectal dictionaries can be linked to other language resources available in the Linked Open Data cloud.

pdf bib
Harmonization of German Lexical Resources for Opinion Mining
Thierry Declerck | Hans-Ulrich Krieger
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present on-going work on the harmonization of existing German lexical resources in the field of opinion and sentiment mining. The input of our harmonization effort consisted in four distinct lexicons of German word forms, encoded either as lemmas or as full forms, marked up with polarity features, at distinct granularity levels. We describe how the lexical resources have been mapped onto each other, generating a unique list of entries, with unified Part-of-Speech information and basic polarity features. Future work will be dedicated to the comparison of the harmonized lexicon with German corpora annotated with polarity information. We are further aiming at both linking the harmonized German lexical resources with similar resources in other languages and publishing the resulting set of lexical data in the context of the Linguistic Linked Open Data cloud.

pdf bib
How to semantically relate dialectal Dictionaries in the Linked Data Framework
Thierry Declerck | Eveline Wandl-Vogt
Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)

pdf bib
Harmonizing Lexical Data for their Linking to Knowledge Objects in the Linked Data Framework
Thierry Declerck
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing

pdf bib
SentiMerge: Combining Sentiment Lexicons in a Bayesian Framework
Guy Emerson | Thierry Declerck
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing

2013

pdf bib
Integration of the Thesaurus for the Social Sciences (TheSoz) in an Information Extraction System
Thierry Declerck
Proceedings of the 7th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

pdf bib
Porting Elements of the Austrian Baroque Corpus onto the Linguistic Linked Open Data Format
Ulrike Czeitschner | Thierry Declerck | Claudia Resch
Proceedings of the Joint Workshop on NLP&LOD and SWAIE: Semantic Web, Linked Open Data and Information Extraction

pdf bib
Linguistically analyzed labels of knowledge objects: How can they support OBIE? Lessons learned from the Monnet and TrendMiner projects
Thierry Declerck
Proceedings of the Joint Workshop on NLP&LOD and SWAIE: Semantic Web, Linked Open Data and Information Extraction

pdf bib
Proceedings of the 2nd Workshop on Linked Data in Linguistics (LDL-2013): Representing and linking lexicons, terminologies and other language data
Christian Chiarcos | Philipp Cimiano | Thierry Declerck | John P. McCrae
Proceedings of the 2nd Workshop on Linked Data in Linguistics (LDL-2013): Representing and linking lexicons, terminologies and other language data

pdf bib
Linguistic Linked Open Data (LLOD). Introduction and Overview
Christian Chiarcos | Philipp Cimiano | Thierry Declerck | John P. McCrae
Proceedings of the 2nd Workshop on Linked Data in Linguistics (LDL-2013): Representing and linking lexicons, terminologies and other language data

2012

pdf bib
Ontology-Based Incremental Annotation of Characters in Folktales
Thierry Declerck | Nikolina Koleva | Hans-Ulrich Krieger
Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

bib
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Nicoletta Calzolari | Khalid Choukri | Thierry Declerck | Mehmet Uğur Doğan | Bente Maegaard | Joseph Mariani | Asuncion Moreno | Jan Odijk | Stelios Piperidis
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

pdf bib
Accessing and standardizing Wiktionary lexical entries for the translation of labels in Cultural Heritage taxonomies
Thierry Declerck | Karlheinz Mörth | Piroska Lendvai
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We describe the usefulness of Wiktionary, the freely available web-based lexical resource, in providing multilingual extensions to catalogues that serve content-based indexing of folktales and related narratives. We develop conversion tools between Wiktionary and TEI, using ISO standards (LMF, MAF), to make such resources available to both the Digital Humanities community and the Language Resources community. The converted data can be queried via a web interface, while the tools of the workflow are to be released with an open source license. We report on the actual state and functionality of our tools and analyse some shortcomings of Wiktionary, as well as potential domains of application.

pdf bib
The META-SHARE Metadata Schema for the Description of Language Resources
Maria Gavrilidou | Penny Labropoulou | Elina Desipri | Stelios Piperidis | Haris Papageorgiou | Monica Monachini | Francesca Frontini | Thierry Declerck | Gil Francopoulo | Victoria Arranz | Valerie Mapelli
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents a metadata model for the description of language resources proposed in the framework of the META-SHARE infrastructure, aiming to cover both datasets and tools/technologies used for their processing. It places the model in the overall framework of metadata models, describes the basic principles and features of the model, elaborates on the distinction between minimal and maximal versions thereof, briefly presents the integrated environment supporting the LRs description and search and retrieval processes and concludes with work to be done in the future for the improvement of the model.

2010

pdf bib
LAF/GrAF-grounded Representation of Dependency Structures
Yoshihiko Hayashi | Thierry Declerck | Chiharu Narawa
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper shows that a LAF/GrAF-based annotation schema can be used for the adequate representation of syntactic dependency structures possibly in many languages. We first argue that there are at least two types of textual units that can be annotated with dependency information: words/tokens and chunks/phrases. We especially focus on importance of the latter dependency unit: it is particularly useful for representing Japanese dependency structures, known as Kakari-Uke structure. Based on this consideration, we then discuss a sub-typing of GrAF to represent the corresponding dependency structures. We derive three node types, two edge types, and the associated constraints for properly representing both the token-based and the chunk-based dependency structures. We finally propose a wrapper program that, as a proof of concept, converts output data from different dependency parsers in proprietary XML formats to the GrAF-compliant XML representation. It partially proves the value of an international standard like LAF/GrAF in the Web service context: an existing dependency parser can be, in a sense, standardized, once wrapped by a data format conversion process.

pdf bib
Integration of Linguistic Markup into Semantic Models of Folk Narratives: The Fairy Tale Use Case
Piroska Lendvai | Thierry Declerck | Sándor Darányi | Pablo Gervás | Raquel Hervás | Scott Malec | Federico Peinado
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Propp's influential structural analysis of fairy tales created a powerful schema for representing storylines in terms of character functions, which is directly exploitable for computational semantic analysis, and procedural generation of stories of this genre. We tackle two resources that draw on the Proppian model - one formalizes it as a semantic markup scheme and the other as an ontology -, both lacking linguistic phenomena explicitly represented in them. The need for integrating linguistic information into structured semantic resources is motivated by the emergence of suitable standards that facilitate this, as well as the benefits such joint representation would create for transdisciplinary research across Digital Humanities, Computational Linguistics, and Artificial Intelligence.

pdf bib
Towards a Standardized Linguistic Annotation of the Textual Content of Labels in Knowledge Representation Systems
Thierry Declerck | Piroska Lendvai
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

WWe propose applying standardized linguistic annotation to terms included in labels of knowledge representation schemes (taxonomies or ontologies), hypothesizing that this would help improving ontology-based semantic annotation of texts. We share the view that currently used methods for including lexical and terminological information in such hierarchical networks of concepts are not satisfactory, and thus put forward ― as a preliminary step to our annotation goal ― a model for modular representation of conceptual, terminological and linguistic information within knowledge representation systems. Our CTL model is based on two recent initiatives that describe the representation of terminologies and lexicons in ontologies: the Terminae method for building terminological and ontological models from text (Aussenac-Gilles et al., 2008), and the LexInfo metamodel for ontology lexica (Buitelaar et al., 2009). CTL goes beyond the mere fusion of the two models and introduces an additional level of representation for the linguistic objects, whereas those are no longer limited to lexical information but are covering the full range of linguistic phenomena, including constituency and dependency. We also show that the approach benefits linguistic and semantic analysis of external documents that are often to be linked to semantic resources for enrichment with concepts that are newly extracted or inferred.

pdf bib
Extraction, Merging, and Monitoring of Company Data from Heterogeneous Sources
Christian Federmann | Thierry Declerck
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We describe the implementation of an enterprise monitoring system that builds on an ontology-based information extraction (OBIE) component applied to heterogeneous data sources. The OBIE component consists of several IE modules - each extracting on a regular temporal basis a specific fraction of company data from a given data source - and a merging tool, which is used to aggregate all the extracted information about a company. The full set of information about companies, which is to be extracted and merged by the OBIE component, is given in the schema of a domain ontology, which is guiding the information extraction process. The monitoring system, in case it detects changes in the extracted and merged information on a company with respect to the actual state of the knowledge base of the underlying ontology, ensures the update of the population of the ontology. As we are using an ontology extended with temporal information, the system is able to assign time intervals to any of the object instances. Additionally, detected changes can be communicated to end-users, who can validate and possibly correct the resulting updates in the knowledge base.

2009

pdf bib
Proceedings of SRSL 2009, the 2nd Workshop on Semantic Representation of Spoken Language
Manuel Alcantara-Pla | Thierry Declerck
Proceedings of SRSL 2009, the 2nd Workshop on Semantic Representation of Spoken Language

pdf bib
Concept and Relation Extraction in the Finance Domain
Mihaela Vela | Thierry Declerck
Proceedings of the Eight International Conference on Computational Semantics

2008

pdf bib
Foundation of a Component-based Flexible Registry for Language Resources and Technology
Daan Broeder | Thierry Declerck | Erhard Hinrichs | Stelios Piperidis | Laurent Romary | Nicoletta Calzolari | Peter Wittenburg
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Within the CLARIN e-science infrastructure project it is foreseen to develop a component-based registry for metadata for Language Resources and Language Technology. With this registry it is hoped to overcome the problems of the current available systems with respect to inflexible fixed schema, unsuitable terminology and interoperability problems. The registry will address interoperability needs by refering to a shared vocabulary registered in data category registries as they are suggested by ISO.

pdf bib
A Framework for Standardized Syntactic Annotation
Thierry Declerck
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This poster presents an ISO framework for the standardization of syntactic annotation (SynAF). The normative part SynAF is concerned with a metamodel for syntactic annotation that covers both dimensions of constituency and dependency, and propose thus a multi-layered annotation framework that allows the combined and interrelated annotation of language data along both lines of consideration. This standard is designed to be used in close conjuncion with the metamodel presented in the Linguistic Annotation Framework (LAF) and with ISO 12620, Terminology and other language resources - Data categories.

2006

pdf bib
Generic NLP Tools for Supporting Shallow Ontology Building
Thierry Declerck | Mihaela Vela
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we present on-going investigations on how complex syntactic annotation, combined with linguistic semantics, can possibly help in supporting the semi-automatic building of (shallow) ontologies from text by proposing an automated extraction of (possibly underspecified) semantic relations from linguistically annotated text.

pdf bib
Multilingual Lexical Semantic Resources for Ontology Translation
Thierry Declerck | Asunción Gómez Pérez | Ovidiu Vela | Zeno Gantner | David Manzano-Macho
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We describe the integration of some multilingual language resources in ontological descriptions, with the purpose of providing ontologies, which are normally using concept labels in just one (natural) language, with multilingual facility in their design and use in the context of Semantic Web applications, supporting both the semantic annotation of textual documents with multilingual ontology labels and ontology extraction from multilingual text sources.

pdf bib
SynAF: Towards a Standard for Syntactic Annotation
Thierry Declerck
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In the paper we present the actual state of development of an international standard for syntactic annotation, called SynAF. This standard is being prepared by the Technical Committee ISO/TC 37 (Terminology and Other Language Resources), Subcommittee SC 4 (Language Resource Management), in collaboration with the European eContent Project “LIRICS” (Linguistic Infrastructure for Interoperable Resources and Systems).

pdf bib
Annotating text using the Linguistic Description Scheme of MPEG-7: The DIRECT-INFO Scenario
Thierry Declerck | Stephan Busemann | Herwig Rehatschek | Gert Kienast
Proceedings of the 5th Workshop on NLP and XML (NLPXML-2006): Multi-Dimensional Markup in Natural Language Processing

2004

pdf bib
A Large Metadata Domain of Language Resources
Daan Broeder | Thierry Declerck | Laurent Romary | Markus Uneson | Sven Strömqvist | Peter Wittenburg
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Towards a Language Infrastructure for the Semantic Web
Thierry Declerck | Paul Buitelaar | Nicoletta Calzolari | Alessandro Lenci
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Towards Ontology Engineering Based on Linguistic Analysis
Paul Buitelaar | Daniel Olejnik | Mihaela Hutanu | Alexander Schutz | Thierry Declerck | Michael Sintek
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Towards an International Standard on Feature Structure Representation
Kiyong Lee | Lou Burnard | Laurent Romary | Eric de la Clergerie | Thierry Declerck | Syd Bauman | Harry Bunt | Lionel Clément | Tomaž Erjavec | Azim Roussanaly | Claude Roux
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
Event-Coreference across Multiple, Multi-lingual Sources in the Mumis Project
Horacio Saggion | Jan Kuper | Hamish Cunningham | Thierry Declerck | Peter Wittenburg | Marco Puts | Eduard Hoenkamp | Franciska de Jong | Yorick Wilks
Demonstrations

2002

pdf bib
LREP: A Language Repository Exchange Protocol
Daan Broeder | Peter Wittenburg | Thierry Declerck | Laurent Romary
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
COLLATE: Competence Center in Speech and Language Technology
Joanne Capstick | Hans Uszkoreit | Wolfgang Wahlster | Thierry Declerck | Gregor Erbach | Anthony Jameson | Brigitte Jorg | Reinhard Karger | Tillmann Wegst
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib
The Automatic Generation of Formal Annotations in a Multimedia Indexing and Searching Environment
Thierry Declerck | Peter Wittenburg | Hamish Cunningham
Proceedings of the ACL 2001 Workshop on Human Language Technology and Knowledge Management

pdf bib
Introduction: Extending NLP Tools Repositories for the Interaction with Language Data Resource Repositories
Thierry Declerck
Proceedings of the ACL 2001 Workshop on Sharing Tools and Resources

2000

pdf bib
The New Edition of the Natural Language Software Registry (an Initiative of ACL hosted at DFKI)
Thierry Declerck | Alexander Werner Jachmann | Hans Uszkoreit
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1998

pdf bib
Natural Language Access to Software Applications
Paul Schmidt | Marius Groenendijk | Peter Phelan | Henrik Schulz | Sibylle Rieder | Axel Theofilidis | Thierry Declerck | Andrew Bredenkamp
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf bib
Natural Language Access to Software Applications
Paul Schmidt | Sibylle Rieder | Axel Theofilidis | Marius Groenendijk | Peter Phelan | Henrik Schulz | Thierry Declerck | Andrew Bredenkamp
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

1997

pdf bib
Natural Language Dialogue Service for Appointment Scheduling Agents
Stephan Busemann | Thierry Declerck | Abdel Kader Diagne | Luca Dini | Judith Klein | Sven Schmeier
Fifth Conference on Applied Natural Language Processing

pdf bib
Semantic Tagging and NLP Applications
Thierry Declerck | Judith Klein
Tagging Text with Lexical Semantics: Why, What, and How?

1996

pdf bib
Dealing with Cross-Sentential Anaphora Resolution in ALEP
Thierry Declerck
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics

pdf bib
Lean Formalisms, Linguistic Theory and Applications. Grammar Development in ALEP.
Paul Schmidt | Axel Theofilidis | Sibylle Rieder | Thierry Declerck
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics

pdf bib
Efficient Integrated Tagging of Word Constructs
Andrew Bredenkamp | Frederik Fouvry | Thierry Declerck | Bradley Music
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics

Search
Co-authors