2018
pdf
LREMap, a Song of Resources and Evaluation
Riccardo Del Gratta
|
Sara Goggi
|
Gabriella Pardelli
|
Nicoletta Calzolari
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2016
pdf
abs
LREC as a Graph: People and Resources in a Network
Riccardo Del Gratta
|
Francesca Frontini
|
Monica Monachini
|
Gabriella Pardelli
|
Irene Russo
|
Roberto Bartolini
|
Fahad Khan
|
Claudia Soria
|
Nicoletta Calzolari
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This proposal describes a new way to visualise resources in the LREMap, a community-built repository of language resource descriptions and uses. The LREMap is represented as a force-directed graph, where resources, papers and authors are nodes. The analysis of the visual representation of the underlying graph is used to study how the community gathers around LRs and how LRs are used in research.
pdf
abs
New Developments in the LRE Map
Vladimir Popescu
|
Lin Liu
|
Riccardo Del Gratta
|
Khalid Choukri
|
Nicoletta Calzolari
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
In this paper we describe the new developments brought to LRE Map, especially in terms of the user interface of the Web application, of the searching of the information therein, and of the data model updates.
pdf
abs
Ancient Greek WordNet Meets the Dynamic Lexicon: the Example of the Fragments of the Greek Historians
Monica Berti
|
Yuri Bizzoni
|
Federico Boschetti
|
Gregory R. Crane
|
Riccardo Del Gratta
|
Tariq Yousef
Proceedings of the 8th Global WordNet Conference (GWC)
The Ancient Greek WordNet (AGWN) and the Dynamic Lexicon (DL) are multilingual resources to study the lexicon of Ancient Greek texts and their translations. Both AGWN and DL are works in progress that need accuracy improvement and manual validation. After a detailed description of the current state of each work, this paper illustrates a methodology to cross AGWN and DL data, in order to mutually score the items of each resource according to the evidence provided by the other resource. The training data is based on the corpus of the Digital Fragmenta Historicorum Graecorum (DFHG), which includes ancient Greek texts with Latin translations.
2014
pdf
abs
The Making of Ancient Greek WordNet
Yuri Bizzoni
|
Federico Boschetti
|
Harry Diakoff
|
Riccardo Del Gratta
|
Monica Monachini
|
Gregory Crane
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper describes the process of creation and review of a new lexico-semantic resource for the classical studies: AncientGreekWordNet. The candidate sets of synonyms (synsets) are extracted from Greek-English dictionaries, on the assumption that Greek words translated by the same English word or phrase have a high probability of being synonyms or at least semantically closely related. The process of validation and the web interface developed to edit and query the resource are described in detail. The lexical coverage of Ancient Greek WordNet is illustrated and the accuracy is evaluated. Finally, scenarios for exploiting the resource are discussed.
pdf
abs
META-SHARE: One year after
Stelios Piperidis
|
Harris Papageorgiou
|
Christian Spurk
|
Georg Rehm
|
Khalid Choukri
|
Olivier Hamon
|
Nicoletta Calzolari
|
Riccardo del Gratta
|
Bernardo Magnini
|
Christian Girardi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper presents META-SHARE (www.meta-share.eu), an open language resource infrastructure, and its usage since its Europe-wide deployment in early 2013. META-SHARE is a network of repositories that store language resources (data, tools and processing services) documented with high-quality metadata, aggregated in central inventories allowing for uniform search and access. META-SHARE was developed by META-NET (www.meta-net.eu) and aims to serve as an important component of a language technology marketplace for researchers, developers, professionals and industrial players, catering for the full development cycle of language technology, from research through to innovative products and services. The observed usage in its initial steps, the steadily increasing number of network nodes, resources, users, queries, views and downloads are all encouraging and considered as supportive of the choices made so far. In tandem, take-up activities like direct linking and processing of datasets by language processing services as well as metadata transformation to RDF are expected to open new avenues for data and resources linking and boost the organic growth of the infrastructure while facilitating language technology deployment by much wider research communities and industrial sectors.
pdf
abs
The LRE Map disclosed
Riccardo Del Gratta
|
Gabriella Pardelli
|
Sara Goggi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper describes a serialization of the LRE Map database according to the RDF model. Due to the peculiar nature of the LRE Map, many ontologies are necessary to model the map in RDF, including newly created and reused ontologies. The importance of having the LRE Map in RDF and its connections to other open resources is also addressed.
2013
pdf
Generative Lexicon Theory and Linguistic Linked Open Data
Fahad Khan
|
Francesca Frontini
|
Riccardo Del Gratta
|
Monica Monachini
|
Valeria Quochi
Proceedings of the 6th International Conference on Generative Approaches to the Lexicon (GL2013)
pdf
Towards the establishment of a linguistic linked data network for Italian
Roberto Bartolini
|
Riccardo Del Gratta
|
Francesca Frontini
Proceedings of the 2nd Workshop on Linked Data in Linguistics (LDL-2013): Representing and linking lexicons, terminologies and other language data
2012
pdf
abs
The Language Library: supporting community effort for collective resource production
Riccardo Del Gratta
|
Francesca Frontini
|
Francesco Rubino
|
Irene Russo
|
Nicoletta Calzolari
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Relations among phenomena at different linguistic levels are at the essence of language properties but today we focus mostly on one specific linguistic layer at a time, without (having the possibility of) paying attention to the relations among the different layers. At the same time our efforts are too much scattered without much possibility of exploiting other people's achievements. To address the complexities hidden in multilayer interrelations even small amounts of processed data can be useful, improving the performance of complex systems. Exploiting the current trend towards sharing we want to initiate a collective movement that works towards creating synergies and harmonisation among different annotation efforts that are now dispersed. In this paper we present the general architecture of the Language Library, an initiative which is conceived as a facility for gathering and making available through simple functionalities the linguistic knowledge the field is able to produce, putting in place new ways of collaboration within the LRT community. In order to reach this goal, a first population round of the Language Library has started around a core of parallel/comparable texts that have been annotated by several contributors submitting a paper for LREC2012. The Language Library has also an ancillary aim related to language documentation and archiving and it is conceived as a theory-neutral space which allows for several language processing philosophies to coexist.
pdf
abs
The LRE Map. Harmonising Community Descriptions of Resources
Nicoletta Calzolari
|
Riccardo Del Gratta
|
Gil Francopoulo
|
Joseph Mariani
|
Francesco Rubino
|
Irene Russo
|
Claudia Soria
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Accurate and reliable documentation of Language Resources is an undisputable need: documentation is the gateway to discovery of Language Resources, a necessary step towards promoting the data economy. Language resources that are not documented virtually do not exist: for this reason every initiative able to collect and harmonise metadata about resources represents a valuable opportunity for the NLP community. In this paper we describe the LRE Map, reporting statistics on resources associated with LREC2012 papers and providing comparisons with LREC2010 data. The LRE Map, jointly launched by FLaReNet and ELRA in conjunction with the LREC 2010 Conference, is an instrument for enhancing availability of information about resources, either new or already existing ones. It wants to reinforce and facilitate the use of standards in the community. The LRE Map web interface provides the possibility of searching according to a fixed set of metadata and to view the details of extracted resources. The LRE Map is continuing to collect bottom-up input about resources from authors of other conferences through standard submission process. This will help broadening the notion of language resources and attract to the field neighboring disciplines that so far have been only marginally involved by the standard notion of language resources.
2011
pdf
The Language Library: Many Layers, More Knowledge
Nicoletta Calzolari
|
Riccardo Del Gratta
|
Francesca Frontini
|
Irene Russo
Proceedings of the Workshop on Language Resources, Technology and Services in the Sharing Paradigm
2010
pdf
abs
A Bilingual Dictionary Mexican Sign Language-Spanish/Spanish-Mexican Sign Language
Antoinette Hawayek
|
Riccardo Del Gratta
|
Giuseppe Cappelli
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
We present a three-part bilingual specialized dictionary Mexican Sign Language-Spanish / Spanish-Mexican Sign Language. This dictionary will be the outcome of a three-years agreement between the Italian Consiglio Nazionale delle Ricerche and the Mexican Conacyt. Although many other sign language dictionaries have been provided to deaf communities, there are no Mexican Sign Language dictionaries in Mexico, yet. We want to stress on the specialized feature of the proposed dictionary: the bilingual dictionary will contain frequently used general Spanish forms along with scholastic course specific specialized words whose meanings warrant comprehension of school curricula. We emphasize that this aspect of the bilingual dictionary can have a deep social impact, since we will furnish to deaf people the possibility to get competence in official language, which is necessary to ensure access to school curriculum and to become full-fledged citizens. From a technical point of view, the dictionary consists of a relational database, where we have saved the sign parameters and a graphical user interface especially designed to allow deaf children to retrieve signs using the relevant parameters and,thus, the meaning of the sign in Spanish.
pdf
abs
The LREC Map of Language Resources and Technologies
Nicoletta Calzolari
|
Claudia Soria
|
Riccardo Del Gratta
|
Sara Goggi
|
Valeria Quochi
|
Irene Russo
|
Khalid Choukri
|
Joseph Mariani
|
Stelios Piperidis
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
In this paper we present the LREC Map of Language Resources and Tools, an innovative feature introduced with this LREC. The purpose of the Map is to shed light on the vast amount of resources and tools that represent the background of the research presented at LREC, in the attempt to fill in a gap in the community knowledge about the resources and tools that are used or created worldwide. It also aims at a change of culture in the field, actively engaging each researcher in the documentation task about resources. The Map has been developed on the basis of the information provided by LREC authors during the submission of papers to the LREC 2010 conference and the LREC workshops, and contains information about almost 2000 resources. The paper illustrates the motivation behind this initiative, its main characteristics, its relevance and future impact in the field, the metadata used to describe the resources, and finally presents some of the most relevant findings.
2008
pdf
abs
A lexicon for biology and bioinformatics: the BOOTStrep experience.
Valeria Quochi
|
Monica Monachini
|
Riccardo Del Gratta
|
Nicoletta Calzolari
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
This paper describes the design, implementation and population of a lexical resource for biology and bioinformatics (the BioLexicon) developed within an ongoing European project. The aim of this project is text-based knowledge harvesting for support to information extraction and text mining in the biomedical domain. The BioLexicon is a large-scale lexical-terminological resource encoding different information types in one single integrated resource. In the design of the resource we follow the ISO/DIS 24613 Lexical Mark-up Framework standard, which ensures reusability of the information encoded and easy exchange of both data and architecture. The design of the resource also takes into account the needs of our text mining partners who automatically extract syntactic and semantic information from texts and feed it to the lexicon. The present contribution first describes in detail the model of the BioLexicon along its three main layers: morphology, syntax and semantics; then, it briefly describes the database implementation of the model and the population strategy followed within the project, together with an example. The BioLexicon database in fact comes equipped with automatic uploading procedures based on a common exchange XML format, which guarantees that the lexicon can be properly populated with data coming from different sources.
pdf
abs
UFRA: a UIMA-based Approach to Federated Language Resource Architecture
Riccardo Del Gratta
|
Roberto Bartolini
|
Tommaso Caselli
|
Monica Monachini
|
Claudia Soria
|
Nicoletta Calzolari
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
In this paper we address the issue of developing an interoperable infrastructure for language resources and technologies. In our approach, called UFRA, we extend the Federate Database Architecture System adding typical functionalities caming from UIMA. In this way, we capitalize the advantages of a federated architecture, such as autonomy, heterogeneity and distribution of components, monitored by a central authority responsible for checking both the integration of components and user rights on performing different tasks. We use the UIMA approach to manage and define one common front-end, enabling users and clients to query, retrieve and use language resources and technologies. The purpose of this paper is to show how UIMA leads from a Federated Database Architecture to a Federated Resource Architecture, adding to a registry of available components both static resources such as lexicons and corpora and dynamic ones such as tools and general purpose language technologies. At the end of the paper, we present a case-study that adopts this framework to integrate the SIMPLE lexicon and TIMEML annotation guidelines to tag natural language texts.
pdf
abs
Simple-Clips ongoing research: more information with less data by implementing inheritance
Riccardo Del Gratta
|
Nilda Ruimy
|
Antonio Toral
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
This paper presents the application of inheritance to the formal taxonomy (is-a) of a semantically rich Language Resource based on the Generative Lexicon theory, SIMPLE-CLIPS. The aim is to lighten the representation of its semantic layer by reducing the number of encoded relations. A prediction calculation on the impact of introducing inheritance regarding space occupancy is carried out, yielding a significant space reduction of 22%. This is corroborated by its actual application, which reduces the number of explicitly encoded relations in this lexicon by 18.4%. Later on, we study the issues that inheritance poses to the Language Resources, and discuss sensitive solutions to tackle each of them, including examples. Finally, we present a discussion on the application of inheritance, from which two side effect advantages arise: consistency enhancement and inference capabilities.