Andrea Marchetti

2014

pdf abs
Accommodations in Tuscany as Linked Data
Clara Bacciu | Angelica Lo Duca | Andrea Marchetti | Maurizio Tesconi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The OpeNER Linked Dataset (OLD) contains 19.140 entries about accommodations in Tuscany (Italy). For each accommodation, it describes the type, e.g. hotel, bed and breakfast, hostel, camping etc., and other useful information, such as a short description, the Web address, its location and the features it provides. OLD is the linked data version of the open dataset provided by Fondazione Sistema Toscana, the representative system for tourism in Tuscany. In addition, to the original dataset, OLD provides also the link of each accommodation to the most common social media (Facebook, Foursquare, Google Places and Booking). OLD exploits three common ontologies of the accommodation domain: Acco, Hontology and GoodRelations. The idea is to provide a flexible dataset, which speaks more than one ontology. OLD is available as a SPARQL node and is released under the Creative Commons release. Finally, OLD is developed within the OpeNER European project, which aims at building a set of ready to use tools to recognize and disambiguate entity mentions and perform sentiment analysis and opinion detection on texts. Within the project, OLD provides a named entity repository for entity disambiguation.

In the last few years the amount of manuscripts digitized and made available on the Web has been constantly increasing. However, there is still a considarable lack of results concerning both the explicitation of their content and the tools developed to make it available. The objective of the Clavius on the Web project is to develop a Web platform exposing a selection of Christophorus Clavius letters along with three different levels of analysis: linguistic, lexical and semantic. The multilayered annotation of the corpus involves a XML-TEI encoding followed by a tokenization step where each token is univocally identified through a CTS urn notation and then associated to a part-of-speech and a lemma. The text is lexically and semantically annotated on the basis of a lexicon and a domain ontology, the former structuring the most relevant terms occurring in the text and the latter representing the domain entities of interest (e.g. people, places, etc.). Moreover, each entity is connected to linked and non linked resources, including DBpedia and VIAF. Finally, the results of the three layers of analysis are gathered and shown through interactive visualization and storytelling techniques. A demo version of the integrated architecture was developed.

2009

pdf
SemEval-2010 Task 17: All-words Word Sense Disambiguation on a Specific Domain
Eneko Agirre | Oier Lopez de Lacalle | Christiane Fellbaum | Andrea Marchetti | Antonio Toral | Piek Vossen
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)

2008

We outline work performed within the framework of a current EC project. The goal is to construct a language-independent information system for a specific domain (environment/ecology/biodiversity) anchored in a language-independent ontology that is linked to wordnets in seven languages. For each language, information extraction and identification of lexicalized concepts with ontological entries is carried out by text miners (Kybots). The mapping of language-specific lexemes to the ontology allows for crosslinguistic identification and translation of equivalent terms. The infrastructure developed within this project enables long-range knowledge sharing and transfer across many languages and cultures, addressing the need for global and uniform transition of knowledge beyond the specific domains addressed here.

2006

pdf abs
Moving to dynamic computational lexicons with LeXFlow
Claudia Soria | Maurizio Tesconi | Francesca Bertagna | Nicoletta Calzolari | Andrea Marchetti | Monica Monachini
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we present LeXFlow, a web application framework where lexicons already expressed in standardised format semi-automatically interact by reciprocally enriching themselves. LeXFlow is intended for, on the one hand, paving the way to the development of dynamic multi-source lexicons; and on the other, for fostering the adoption of standards. Borrowing from techniques used in the domain of document workflows, we model the activity of lexicon management as a particular case of workflow instance, where lexical entries move across agents and become dynamically updated. To this end, we have designed a lexical flow (LF) corresponding to the scenario where an entry of a lexicon A becomes enriched via basically two steps. First, by virtue of being mapped onto a corresponding entry belonging to a lexicon B, the entry(LA) inherits the semantic relations available in lexicon B. Second, by resorting to an automatic application that acquires information about semantic relations from corpora, the relations acquired are integrated into the entry and proposed to the human encoder. As a result of the lexical flow, in addition, for each starting lexical entry(LA) mapped onto a corresponding entry(LB) the flow produces a new entry representing the merging of the original two.

pdf
Towards Agent-based Cross-Lingual Interoperability of Distributed Lexical Resources
Claudia Soria | Maurizio Tesconi | Andrea Marchetti | Francesca Bertagna | Monica Monachini | Chu-Ren Huang | Nicoletta Calzolari
Proceedings of the Workshop on Multilingual Language Resources and Interoperability