German Rigau

Also published as: G. Rigau

2021

pdf bib abs
Ask2Transformers: Zero-Shot Domain labelling with Pretrained Language Models
Oscar Sainz | German Rigau
Proceedings of the 11th Global Wordnet Conference

In this paper we present a system that exploits different pre-trained Language Models for assigning domain labels to WordNet synsets without any kind of supervision. Furthermore, the system is not restricted to use a particular set of domain labels. We exploit the knowledge encoded within different off-the-shelf pre-trained Language Models and task formulations to infer the domain label of a particular WordNet definition. The proposed zero-shot system achieves a new state-of-the-art on the English dataset used in the evaluation.

pdf bib abs
Benchmarking Meta-embeddings: What Works and What Does Not
Iker García-Ferrero | Rodrigo Agerri | German Rigau
Findings of the Association for Computational Linguistics: EMNLP 2021

In the last few years, several methods have been proposed to build meta-embeddings. The general aim was to obtain new representations integrating complementary knowledge from different source pre-trained embeddings thereby improving their overall quality. However, previous meta-embeddings have been evaluated using a variety of methods and datasets, which makes it difficult to draw meaningful conclusions regarding the merits of each approach. In this paper we propose a unified common framework, including both intrinsic and extrinsic tasks, for a fair and objective meta-embeddings evaluation. Furthermore, we present a new method to generate meta-embeddings, outperforming previous work on a large number of intrinsic evaluation benchmarks. Our evaluation framework also allows us to conclude that previous extrinsic evaluations of meta-embeddings have been overestimated.

2020

pdf bib
Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMW2020)
Thierry Declerk | Itziar Gonzalez-Dios | German Rigau
Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMW2020)

pdf bib abs
Towards modelling SUMO attributes through WordNet adjectives: a Case Study on Qualities
Itziar Gonzalez-Dios | Javier Alvez | German Rigau
Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMW2020)

Previous studies have shown that the knowledge about attributes and properties in the SUMO ontology and its mapping to WordNet adjectives lacks of an accurate and complete characterization. A proper characterization of this type of knowledge is required to perform formal commonsense reasoning based on the SUMO properties, for instance to distinguish one concept from another based on their properties. In this context, we propose a new semi-automatic approach to model the knowledge about properties and attributes in SUMO by exploiting the information encoded in WordNet adjectives and its mapping to SUMO. To that end, we considered clusters of semantically related groups of WordNet adjectival and nominal synsets. Based on these clusters, we propose a new semi-automatic model for SUMO attributes and their mapping to WordNet, which also includes polarity information. In this paper, as an exploratory approach, we focus on qualities.

pdf bib abs
Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus
Elena Zotova | Rodrigo Agerri | Manuel Nuñez | German Rigau
Proceedings of the 12th Language Resources and Evaluation Conference

Stance detection aims to determine the attitude of a given text with respect to a specific topic or claim. While stance detection has been fairly well researched in the last years, most the work has been focused on English. This is mainly due to the relative lack of annotated data in other languages. The TW-10 referendum Dataset released at IberEval 2018 is a previous effort to provide multilingual stance-annotated data in Catalan and Spanish. Unfortunately, the TW-10 Catalan subset is extremely imbalanced. This paper addresses these issues by presenting a new multilingual dataset for stance detection in Twitter for the Catalan and Spanish languages, with the aim of facilitating research on stance detection in multilingual and cross-lingual settings. The dataset is annotated with stance towards one topic, namely, the ndependence of Catalonia. We also provide a semi-automatic method to annotate the dataset based on a categorization of Twitter users. We experiment on the new corpus with a number of supervised approaches, including linear classifiers and deep learning methods. Comparison of our new corpus with the with the TW-1O dataset shows both the benefits and potential of a well balanced corpus for multilingual and cross-lingual research on stance detection. Finally, we establish new state-of-the-art results on the TW-10 dataset, both for Catalan and Spanish.

pdf bib abs
NUBes: A Corpus of Negation and Uncertainty in Spanish Clinical Texts
Salvador Lima Lopez | Naiara Perez | Montse Cuadros | German Rigau
Proceedings of the 12th Language Resources and Evaluation Conference

This paper introduces the first version of the NUBes corpus (Negation and Uncertainty annotations in Biomedical texts in Spanish). The corpus is part of an on-going research and currently consists of 29,682 sentences obtained from anonymised health records annotated with negation and uncertainty. The article includes an exhaustive comparison with similar corpora in Spanish, and presents the main annotation and design decisions. Additionally, we perform preliminary experiments using deep learning algorithms to validate the annotated dataset. As far as we know, NUBes is the largest available corpora for negation in Spanish and the first that also incorporates the annotation of speculation cues, scopes, and events.

2019

pdf bib abs
Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis
Javier Álvez | Itziar Gonzalez-Dios | German Rigau
Proceedings of the 10th Global Wordnet Conference

We describe a detailed analysis of a sample of large benchmark of commonsense reasoning problems that has been automatically obtained from WordNet, SUMO and their mapping. The objective is to provide a better assessment of the quality of both the benchmark and the involved knowledge resources for advanced commonsense reasoning tasks. By means of this analysis, we are able to detect some knowledge misalignments, mapping errors and lack of knowledge and resources. Our final objective is the extraction of some guidelines towards a better exploitation of this commonsense knowledge framework by the improvement of the included resources.

2018

pdf bib abs
Towards Cross-checking WordNet and SUMO Using Meronymy
Javier Álvez | German Rigau
Proceedings of the 9th Global Wordnet Conference

We describe the practical application of a black-box testing methodology for the validation of the knowledge encoded in WordNet, SUMO and their mapping by using automated theorem provers. In this paper,weconcentrateonthepart-whole information provided by WordNet and create a large set of tests on the basis of few question patterns. From our preliminary evaluation results, we report on some of the detected inconsistencies.

pdf bib
Biomedical term normalization of EHRs with UMLS
Naiara Perez-Miguel | Montse Cuadros | German Rigau
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Developing New Linguistic Resources and Tools for the Galician Language
Rodrigo Agerri | Xavier Gómez Guinovart | German Rigau | Miguel Anxo Solla Portela
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Building Named Entity Recognition Taggers via Parallel Corpora
Rodrigo Agerri | Yiling Chung | Itziar Aldabe | Nora Aranberri | Gorka Labaka | German Rigau
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Cross-checking WordNet and SUMO Using Meronymy
Javier Álvez | Itziar Gonzalez-Dios | German Rigau
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

This paper presents the Event and Implied Situation Ontology (ESO), a resource which formalizes the pre and post situations of events and the roles of the entities affected by an event. The ontology reuses and maps across existing resources such as WordNet, SUMO, VerbNet, PropBank and FrameNet. We describe how ESO is injected into a new version of the Predicate Matrix and illustrate how these resources are used to detect information in large document collections that otherwise would have remained implicit. The model targets interpretations of situations rather than the semantics of verbs per se. The event is interpreted as a situation using RDF taking all event components into account. Hence, the ontology and the linked resources need to be considered from the perspective of this interpretation model.

pdf bib abs
A Comparison of Domain-based Word Polarity Estimation using different Word Embeddings
Aitor García Pablos | Montse Cuadros | German Rigau
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

A key point in Sentiment Analysis is to determine the polarity of the sentiment implied by a certain word or expression. In basic Sentiment Analysis systems this sentiment polarity of the words is accounted and weighted in different ways to provide a degree of positivity/negativity. Currently words are also modelled as continuous dense vectors, known as word embeddings, which seem to encode interesting semantic knowledge. With regard to Sentiment Analysis, word embeddings are used as features to more complex supervised classification systems to obtain sentiment classifiers. In this paper we compare a set of existing sentiment lexicons and sentiment lexicon generation techniques. We also show a simple but effective technique to calculate a word polarity value for each word in a domain using existing continuous word embeddings generation methods. Further, we also show that word embeddings calculated on in-domain corpus capture the polarity better than the ones calculated on general-domain corpus.

pdf bib abs
The Event and Implied Situation Ontology (ESO): Application and Evaluation
Roxane Segers | Marco Rospocher | Piek Vossen | Egoitz Laparra | German Rigau | Anne-Lyse Minard
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents the Event and Implied Situation Ontology (ESO), a manually constructed resource which formalizes the pre and post situations of events and the roles of the entities affected by an event. The ontology is built on top of existing resources such as WordNet, SUMO and FrameNet. The ontology is injected to the Predicate Matrix, a resource that integrates predicate and role information from amongst others FrameNet, VerbNet, PropBank, NomBank and WordNet. We illustrate how these resources are used on large document collections to detect information that otherwise would have remained implicit. The ontology is evaluated on two aspects: recall and precision based on a manually annotated corpus and secondly, on the quality of the knowledge inferred by the situation assertions in the ontology. Evaluation results on the quality of the system show that 50% of the events typed and enriched with ESO assertions are correct.

pdf bib abs
Addressing the MFS Bias in WSD systems
Marten Postma | Ruben Izquierdo | Eneko Agirre | German Rigau | Piek Vossen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Word Sense Disambiguation (WSD) systems tend to have a strong bias towards assigning the Most Frequent Sense (MFS), which results in high performance on the MFS but in a very low performance on the less frequent senses. We addressed the MFS bias in WSD systems by combining the output from a WSD system with a set of mostly static features to create a MFS classifier to decide when to and not to choose the MFS. The output from this MFS classifier, which is based on the Random Forest algorithm, is then used to modify the output from the original WSD system. We applied our classifier to one of the state-of-the-art supervised WSD systems, i.e. IMS, and to of the best state-of-the-art unsupervised WSD systems, i.e. UKB. Our main finding is that we are able to improve the system output in terms of choosing between the MFS and the less frequent senses. When we apply the MFS classifier to fine-grained WSD, we observe an improvement on the less frequent sense cases, whereas we maintain the overall recall.

pdf bib abs
A Multilingual Predicate Matrix
Maddalen Lopez de Lacalle | Egoitz Laparra | Itziar Aldabe | German Rigau
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents the Predicate Matrix 1.3, a lexical resource resulting from the integration of multiple sources of predicate information including FrameNet, VerbNet, PropBank and WordNet. This new version of the Predicate Matrix has been extended to cover nominal predicates by adding mappings to NomBank. Similarly, we have integrated resources in Spanish, Catalan and Basque. As a result, the Predicate Matrix 1.3 provides a multilingual lexicon to allow interoperable semantic analysis in multiple languages.

pdf bib
SemEval-2016 Task 2: Interpretable Semantic Textual Similarity
Eneko Agirre | Aitor Gonzalez-Agirre | Iñigo Lopez-Gazpio | Montse Maritxalar | German Rigau | Larraitz Uria
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Semantic Interoperability for Cross-lingual and cross-document Event Detection
Piek Vossen | Egoitz Laparra | German Rigau | Itziar Aldabe
Proceedings of the The 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation

pdf bib
From TimeLines to StoryLines: A preliminary proposal for evaluating narratives
Egoitz Laparra | Itziar Aldabe | German Rigau
Proceedings of the First Workshop on Computing News Storylines

pdf bib
Proceedings of the Second Workshop on Natural Language Processing and Linked Open Data
Piek Vossen | German Rigau | Petya Osenova | Kiril Simov
Proceedings of the Second Workshop on Natural Language Processing and Linked Open Data

pdf bib
Cross-lingual Event Detection in Discourse
German Rigau
Proceedings of the Second Workshop on Natural Language Processing and Linked Open Data

pdf bib
Document Level Time-anchoring for TimeLine Extraction
Egoitz Laparra | Itziar Aldabe | German Rigau
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
UBC: Cubes for English Semantic Textual Similarity and Supervised Approaches for Interpretable STS
Eneko Agirre | Aitor Gonzalez-Agirre | Iñigo Lopez-Gazpio | Montse Maritxalar | German Rigau | Larraitz Uria
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
V3: Unsupervised Aspect Based Sentiment Analysis for SemEval2015 Task 12
Aitor García-Pablos | Montse Cuadros | German Rigau
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib abs
NewsReader: recording history from daily news streams
Piek Vossen | German Rigau | Luciano Serafini | Pim Stouten | Francis Irving | Willem Van Hage
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The European project NewsReader develops technology to process daily news streams in 4 languages, extracting what happened, when, where and who was involved. NewsReader does not just read a single newspaper but massive amounts of news coming from thousands of sources. It compares the results across sources to complement information and determine where they disagree. Furthermore, it merges news of today with previous news, creating a long-term history rather than separate events. The result is stored in a KnowledgeStore, that cumulates information over time, producing an extremely large knowledge graph that is visualized using new techniques to provide more comprehensive access. We present the first version of the system and the results of processing first batches of data.

pdf bib abs
Predicate Matrix: extending SemLink through WordNet mappings
Maddalen Lopez de Lacalle | Egoitz Laparra | German Rigau
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents the Predicate Matrix v1.1, a new lexical resource resulting from the integration of multiple sources of predicate information including FrameNet, VerbNet, PropBank and WordNet. We start from the basis of SemLink. Then, we use advanced graph-based algorithms to further extend the mapping coverage of SemLink. Second, we also exploit the current content of SemLink to infer new role mappings among the different predicate schemas. As a result, we have obtained a new version of the Predicate Matrix which largely extends the current coverage of SemLink and the previous version of the Predicate Matrix.

pdf bib abs
IXA pipeline: Efficient and Ready to Use Multilingual NLP tools
Rodrigo Agerri | Josu Bermudez | German Rigau
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

IXA pipeline is a modular set of Natural Language Processing tools (or pipes) which provide easy access to NLP technology. It offers robust and efficient linguistic annotation to both researchers and non-NLP experts with the aim of lowering the barriers of using NLP technology either for research purposes or for small industrial developers and SMEs. IXA pipeline can be used “as is” or exploit its modularity to pick and change different components. Given its open-source nature, it can also be modified and extended for it to work with other languages. This paper describes the general data-centric architecture of IXA pipeline and presents competitive results in several NLP annotations for English and Spanish.

pdf bib
First steps towards a Predicate Matrix
Maddalen López de Lacalle | Egoitz Laparra | German Rigau
Proceedings of the Seventh Global Wordnet Conference

pdf bib
V3: Unsupervised Generation of Domain Aspect Terms for Aspect Based Sentiment Analysis
Aitor García-Pablos | Montse Cuadros | German Rigau
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Simple, Robust and (almost) Unsupervised Generation of Polarity Lexicons for Multiple Languages
Iñaki San Vicente | Rodrigo Agerri | German Rigau
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Multilingual, Efficient and Easy NLP Processing with IXA Pipeline
Rodrigo Agerri | Josu Bermudez | German Rigau
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

pdf bib
UBC_UOS-TYPED: Regression for typed-similarity
Eneko Agirre | Nikolaos Aletras | Aitor Gonzalez-Agirre | German Rigau | Mark Stevenson
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

pdf bib
Sources of Evidence for Implicit Argument Resolution
Egoitz Laparra | German Rigau
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers

pdf bib
ImpAr: A Deterministic Algorithm for Implicit Semantic Role Labelling
Egoitz Laparra | German Rigau
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib abs
Multilingual Central Repository version 3.0
Aitor Gonzalez-Agirre | Egoitz Laparra | German Rigau
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes the upgrading process of the Multilingual Central Repository (MCR). The new MCR uses WordNet 3.0 as Interlingual-Index (ILI). Now, the current version of the MCR integrates in the same EuroWordNet framework wordnets from five different languages: English, Spanish, Catalan, Basque and Galician. In order to provide ontological coherence to all the integrated wordnets, the MCR has also been enriched with a disparate set of ontologies: Base Concepts, Top Ontology, WordNet Domains and Suggested Upper Merged Ontology. The whole content of the MCR is freely available.

pdf bib abs
A proposal for improving WordNet Domains
Aitor González-Agirre | Mauro Castillo | German Rigau
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

WordNet Domains (WND) is a lexical resource where synsets have been semi-automatically annotated with one or more domain labels from a set of 165 hierarchically organized domains. The uses of WND include the power to reduce the polysemy degree of the words, grouping those senses that belong to the same domain. But the semi-automatic method used to develop this resource was far from being perfect. By cross-checking the content of the Multilingual Central Repository (MCR) it is possible to find some errors and inconsistencies. Many are very subtle. Others, however, leave no doubt. Moreover, it is very difficult to quantify the number of errors in the original version of WND. This paper presents a novel semi-automatic method to propagate domain information through the MCR. We also compare both labellings (the original and the new one) allowing us to detect anomalies in the original WND labels.

pdf bib abs
Highlighting relevant concepts from Topic Signatures
Montse Cuadros | Lluís Padró | German Rigau
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents deepKnowNet, a new fully automatic method for building highly dense and accurate knowledge bases from existing semantic resources. Basically, the method applies a knowledge-based Word Sense Disambiguation algorithm to assign the most appropriate WordNet sense to large sets of topically related words acquired from the web, named TSWEB. This Word Sense Disambiguation algorithm is the personalized PageRank algorithm implemented in UKB. This new method improves by automatic means the current content of WordNet by creating large volumes of new and accurate semantic relations between synsets. KnowNet was our first attempt towards the acquisition of large volumes of semantic relations. However, KnowNet had some limitations that have been overcomed with deepKnowNet. deepKnowNet disambiguates the first hundred words of all Topic Signatures from the web (TSWEB). In this case, the method highlights the most relevant word senses of each Topic Signature and filter out the ones that are not so related to the topic. In fact, the knowledge it contains outperforms any other resource when is empirically evaluated in a common framework based on a similarity task annotated with human judgements.

pdf bib abs
Mapping WordNet to the Kyoto ontology
Egoitz Laparra | German Rigau | Piek Vossen
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes the connection of WordNet to a generic ontology based on DOLCE. We developed a complete set of heuristics for mapping all WordNet nouns, verbs and adjectives to the ontology. Moreover, the mapping also allows to represent predicates in a uniform and interoperable way, regardless of the way they are expressed in the text and in which language. Together with the ontology, the WordNet mappings provide a extremely rich and powerful basis for semantic processing of text in any domain. In particular, the mapping has been used in a knowledge-rich event-mining system developed for the Asian-European project KYOTO.

2011

pdf bib
Using Kybots for Extracting Events in Biomedical Texts
Arantza Casillas | Arantza Díaz de Ilarraza | Koldo Gojenola | Maite Oronoz | German Rigau
Proceedings of BioNLP Shared Task 2011 Workshop

2010

pdf bib abs
Wikicorpus: A Word-Sense Disambiguated Multilingual Wikipedia Corpus
Samuel Reese | Gemma Boleda | Montse Cuadros | Lluís Padró | German Rigau
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This article presents a new freely available trilingual corpus (Catalan, Spanish, English) that contains large portions of the Wikipedia and has been automatically enriched with linguistic information. To our knowledge, this is the largest such corpus that is freely available to the community: In its present version, it contains over 750 million words. The corpora have been annotated with lemma and part of speech information using the open source library FreeLing. Also, they have been sense annotated with the state of the art Word Sense Disambiguation algorithm UKB. As UKB assigns WordNet senses, and WordNet has been aligned across languages via the InterLingual Index, this sort of annotation opens the way to massive explorations in lexical semantics that were not possible before. We present a first attempt at creating a trilingual lexical resource from the sense-tagged Wikipedia corpora, namely, WikiNet. Moreover, we present two by-products of the project that are of use for the NLP community: An open source Java-based parser for Wikipedia pages developed for the construction of the corpus, and the integration of the WSD algorithm UKB in FreeLing.

pdf bib abs
Exploring Knowledge Bases for Similarity
Eneko Agirre | Montse Cuadros | German Rigau | Aitor Soroa
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Graph-based similarity over WordNet has been previously shown to perform very well on word similarity. This paper presents a study of the performance of such a graph-based algorithm when using different relations and versions of Wordnet. The graph algorithm is based on Personalized PageRank, a random-walk based algorithm which computes the probability of a random-walk initiated in the target word to reach any synset following the relations in WordNet (Haveliwala, 2002). Similarity is computed as the cosine of the probability distributions for each word over WordNet. The best combination of relations includes all relations in WordNet 3.0, included disambiguated glosses, and automatically disambiguated topic signatures called KnowNets. All relations are part of the official release of WordNet, except KnowNets, which have been derived automatically. The results over the WordSim 353 dataset show that using the adequate relations the performance improves over previously published WordNet-based results on the WordSim353 dataset (Finkelstein et al., 2002). The similarity software and some graphs used in this paper are publicly available at http://ixa2.si.ehu.es/ukb.

pdf bib abs
Integrating a Large Domain Ontology of Species into WordNet
Montse Cuadros | Egoitz Laparra | German Rigau | Piek Vossen | Wauter Bosma
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

With the proliferation of applications sharing information represented in multiple ontologies, the development of automatic methods for robust and accurate ontology matching will be crucial to their success. Connecting and merging already existing semantic networks is perhaps one of the most challenging task related to knowledge engineering. This paper presents a new approach for aligning automatically a very large domain ontology of Species to WordNet in the framework of the KYOTO project. The approach relies on the use of knowledge-based Word Sense Disambiguation algorithm which accurately assigns WordNet synsets to the concepts represented in Species 2000.

pdf bib abs
eXtended WordFrameNet
Egoitz Laparra | German Rigau
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents a novel automatic approach to partially integrate FrameNet and WordNet. In that way we expect to extend FrameNet coverage, to enrich WordNet with frame semantic information and possibly to extend FrameNet to languages other than English. The method uses a knowledge-based Word Sense Disambiguation algorithm for matching the FrameNet lexical units to WordNet synsets. Specifically, we exploit a graph-based Word Sense Disambiguation algorithm that uses a large-scale knowledge-base derived from existing semantic resources. We have developed and tested additional versions of this algorithm showing substantial improvements over state-of-the-art results. Finally, we show some examples and figures of the resulting semantic resource.

pdf bib
GPLSI-IXA: Using Semantic Classes to Acquire Monosemous Training Examples from Domain Texts
Rubén Izquierdo | Armando Suárez | German Rigau
Proceedings of the 5th International Workshop on Semantic Evaluation

2009

pdf bib
An Empirical Study on Class-Based Word Sense Disambiguation
Rubén Izquierdo | Armando Suárez | German Rigau
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib
Integrating WordNet and FrameNet using a Knowledge-based Word Sense Disambiguation Algorithm
Egoitz Laparra | German Rigau
Proceedings of the International Conference RANLP-2009

2008

This paper presents the complete and consistent ontological annotation of the nominal part of WordNet. The annotation has been carried out using the semantic features defined in the EuroWordNet Top Concept Ontology and made available to the NLP community. Up to now only an initial core set of 1,024 synsets, the so-called Base Concepts, was ontologized in such a way. The work has been achieved by following a methodology based on an iterative and incremental expansion of the initial labeling through the hierarchy while setting inheritance blockage points. Since this labeling has been set on the EuroWordNets Interlingual Index (ILI), it can be also used to populate any other wordnet linked to it through a simple porting process. This feature-annotated WordNet is intended to be useful for a large number of semantic NLP tasks and for testing for the first time componential analysis on real environments. Moreover, the quantitative analysis of the work shows that more than 40% of the nominal part of WordNet is involved in structure errors or inadequacies.

We outline work performed within the framework of a current EC project. The goal is to construct a language-independent information system for a specific domain (environment/ecology/biodiversity) anchored in a language-independent ontology that is linked to wordnets in seven languages. For each language, information extraction and identification of lexicalized concepts with ontological entries is carried out by text miners (Kybots). The mapping of language-specific lexemes to the ontology allows for crosslinguistic identification and translation of equivalent terms. The infrastructure developed within this project enables long-range knowledge sharing and transfer across many languages and cultures, addressing the need for global and uniform transition of knowledge beyond the specific domains addressed here.

pdf bib abs
WNTERM: Enriching the MCR with a Terminological Dictionary
Eli Pociello | Antton Gurrutxaga | Eneko Agirre | Izaskun Aldezabal | German Rigau
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we describe the methodology and the first steps for the creation of WNTERM (from WordNet and Terminology), a specialized lexicon produced from the merger of the EuroWordNet-based Multilingual Central Repository (MCR) and the Basic Encyclopaedic Dictionary of Science and Technology (BDST). As an example, the ecology domain has been used. The final result is a multilingual (Basque and English) light-weight domain ontology, including taxonomic and other semantic relations among its concepts, which is tightly connected to other wordnets.

pdf bib
KnowNet: A Proposal for Building Highly Connected and Dense Knowledge Bases from the Web
Montse Cuadros | German Rigau
Semantics in Text Processing. STEP 2008 Conference Proceedings

pdf bib
KnowNet: Building a Large Net of Knowledge from the Web
Montse Cuadros | German Rigau
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)