Horacio Rodríguez

Also published as: Horacio Rodriguez

2017

pdf abs
UPC-USMBA at SemEval-2017 Task 3: Combining multiple approaches for CQA for Arabic
Yassine El Adlouni | Imane Lahbari | Horacio Rodríguez | Mohammed Meknassi | Said Ouatik El Alaoui | Noureddine Ennahnahi
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper presents a description of the participation of the UPC-USMBA team in the SemEval 2017 Task 3, subtask D, Arabic. Our approach for facing the task is based on a combination of a set of atomic classifiers. The atomic classifiers include lexical string based, based on vectorial representations and rulebased. Several combination approaches have been tried.

This article reports an intrinsic automatic summarization evaluation in the scientific lecture domain. The lecture takes place in a Smart Room that has access to different types of documents produced from different media. An evaluation framework is presented to analyze the performance of systems producing summaries answering a user need. Several ROUGE metrics are used and a manual content responsiveness evaluation was carried out in order to analyze the performance of the evaluated approaches. Various multilingual summarization approaches are analyzed showing that the use of different types of documents outperforms the use of transcripts. In fact, not using any part of the spontaneous speech transcription in the summary improves the performance of automatic summaries. Moreover, the use of semantic information represented in the different textual documents coming from different media helps to improve summary quality.

2011

pdf
Cultural Configuration of Wikipedia: measuring Autoreferentiality in Different Languages
Marc Miquel Ribé | Horacio Rodríguez
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf abs
ADN-Classifier:Automatically Assigning Denotation Types to Nominalizations
Aina Peris | Mariona Taulé | Gemma Boleda | Horacio Rodríguez
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents the ADN-Classifier, an Automatic classification system of Spanish Deverbal Nominalizations aimed at identifying its semantic denotation (i.e. event, result, underspecified, or lexicalized). The classifier can be used for NLP tasks such as coreference resolution or paraphrase detection. To our knowledge, the ADN-Classifier is the first effort in acquisition of denotations for nominalizations using Machine Learning. We compare the results of the classifier when using a decreasing number of Knowledge Sources, namely (1) the complete nominal lexicon (AnCora-Nom) that includes sense distictions, (2) the nominal lexicon (AnCora-Nom) removing the sense-specific information, (3) nominalizations context information obtained from a treebank corpus (AnCora-Es) and (4) the combination of the previous linguistic resources. In a realistic scenario, that is, without sense distinction, the best results achieved are those taking into account the information declared in the lexicon (89.40% accuracy). This shows that the lexicon contains crucial information (such as argument structure) that corpus-derived features cannot substitute for.

pdf abs
Finding Domain Terms using Wikipedia
Jorge Vivaldi | Horacio Rodríguez
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper we present a new approach for obtaining the terminology of a given domain using the category and page structures of the Wikipedia in a language independent way. Our approach consists basically, for each domain, on navigating the Category graph of the Wikipedia starting from the root nodes associated to the domain. A heavy filtering mechanism is carried out for preventing as much as possible the inclusion of spurious categories. For each selected category all the pages belonging to it are then recovered and filtered. This procedure is iterate several times until achieving convergence. Both category names and page names are considered candidates to belong to the terminology of the domain. This approach has been applied to three broad coverage domains: astronomy, chemistry and medicine, and two languages, English and Spanish, showing a promising performance.

2008

pdf abs
Arabic WordNet: Semi-automatic Extensions using Bayesian Inference
Horacio Rodríguez | David Farwell | Javi Ferreres | Manuel Bertran | Musa Alkhalifa | M. Antonia Martí
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This presentation focuses on the semi-automatic extension of Arabic WordNet (AWN) using lexical and morphological rules and applying Bayesian inference. We briefly report on the current status of AWN and propose a way of extending its coverage by taking advantage of a limited set of highly productive Arabic morphological rules for deriving a range of semantically related word forms from verb entries. The application of this set of rules, combined with the use of bilingual Arabic-English resources and Princetons WordNet, allows the generation of a graph representing the semantic neighbourhood of the original word. In previous work, a set of associations between the hypothesized Arabic words and English synsets was proposed on the basis of this graph. Here, a novel approach to extending AWN is presented whereby a Bayesian Network is automatically built from the graph and then the net is used as an inferencing mechanism for scoring the set of candidate associations. Both on its own and in combination with the previous technique, this new approach has led to improved results.

2007

pdf
Support Vector Machines for Query-focused Summarization trained and evaluated on Pyramid data
Maria Fuentes | Enrique Alfonseca | Horacio Rodríguez
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf
Machine Learning with Semantic-Based Distances Between Sentences for Textual Entailment
Daniel Ferrés | Horacio Rodríguez
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

2006

This paper introduces a recently initiated project that focuses on building a lexical resource for Modern Standard Arabic based on the widely used Princeton WordNet for English (Fellbaum, 1998). Our aim is to develop a linguistic resource with a deep formal semantic foundation in order to capture the richness of Arabic as described in Elkateb (2005). Arabic WordNet is being constructed following methods developed for EuroWordNet (Vossen, 1998). In addition to the standard wordnet representation of senses, word meanings are also being defined with a machine understandable semantics in first order logic. The basis for this semantics is the Suggested Upper Merged Ontology and its associated domain ontologies (Niles and Pease, 2001). We will greatly extend the ontology and its set of mappings to provide formal terms and definitions for each synset. Tools to be developed as part of this effort include a lexicographer's interface modeled on that used for EuroWordNet, with added facilities for Arabic script, following Black and Elkateb's earlier work (2004).

pdf
Experiments Adapting an Open-Domain Question Answering System to the Geographical Domain Using Scope-Based Resources
Daniel Ferrés | Horacio Rodríguez
Proceedings of the Workshop on Multilingual Question Answering - MLQA ‘06

Arabic WordNet is a lexical resource for Modern Standard Arabic based on the widely used Princeton WordNet for English (Fellbaum, 1998). Arabic WordNet (AWN) is based on the design and contents of the universally accepted Princeton WordNet (PWN) and will be mappable straightforwardly onto PWN 2.0 and EuroWordNet (EWN), enabling translation on the lexical level to English and dozens of other languages. We have developed and linked the AWN with the Suggested Upper Merged Ontology (SUMO), where concepts are defined with machine interpretable semantics in first order logic (Niles and Pease, 2001). We have greatly extended the ontology and its set of mappings to provide formal terms and definitions for each synset. The end product would be a linguistic resource with a deep formal semantic foundation that is able to capture the richness of Arabic as described in Elkateb (2005). Tools we have developed as part of this effort include a lexicographer's interface modeled on that used for EuroWordNet, with added facilities for Arabic script, following Black and Elkateb's earlier work (2004). In this paper we describe our methodology for building a lexical resource in Arabic and the challenge of Arabic for lexical resources.

2004

pdf
Automatic Classification of Geographic Named Entities
Daniel Ferrés | Marc Massot | Muntsa Padró | Horacio Rodríguez | Jordi Turmo
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf abs
Selecting the Correct English Synset for a Spanish Sense
Javier Farreres | Horacio Rodríguez
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

This work tries to enrich the Spanish Wordnet using a Spanish taxonomy as a knowledge source. The Spanish taxonomy is composed by Spanish senses, while Spanish Wordnet is composed by synsets, mostly linked to English WordNet. A set of weighted associations between Spanish words and Wordnet synsets is used for inferring associations between both taxonomies.

pdf abs
Automatically Selecting Domain Markers for Terminology Extraction
Jorge Vivaldi | Horacio Rodríguez
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Some approaches to automatic terminology extraction from corpora imply the use of existing semantic resources for guiding the detection of terms. Most of these systems exploit specialised resources, like UMLS in the medical domain, while a few try to take profit from general-purpose semantic resources, like EuroWordNet (EWN). As the term extraction task is clearly domain depending, in the case a general-purpose resource without specific domain information is used, we need a way of attaching domain information to the units of the resource. For big resources it is desirable that this semantic enrichment could be carried out automatically. Given a specific domain, our proposal aims to detect in EWN those units that can be considered as domain markers (DM). We can define a DM as an EWN entry whose attached strings belong to the domain, as well as the variants of all its descendents through the hyponymy relation. The procedure we propose in this paper is fully automatic and, a priori, domain-independent. The only external knowledge it uses is a set of terms, which is an external vocabulary, which is considered to have at least one sense belonging to the domain.

pdf
Re-using High-quality Resources for Continued Evaluation of Automated Summarization Systems
Laura Alonso | Maria Fuentes | Marc Massot | Horacio Rodríguez
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf
Automatic Building Gazetteers of Co-referring Named Entities
Daniel Ferrés | Marc Massot | Muntsa Padró | Horacio Rodríguez | Jordi Turmo
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf
Semiautomatic Creation of Taxonomies
Javier Farreres | Horacio Rodríguez | Karina Gibert
COLING-02: SEMANET: Building and Using Semantic Networks

2001

pdf
Probabilistic Modelling of Island-Driven Parsing
Alicia Ageno | Horacio Rodríguez
Proceedings of the Seventh International Workshop on Parsing Technologies

1999

pdf
Improving POS Tagging Using Machine-Learning Techniques
Lluis Marquez | Horacio Rodriguez | Josep Carmona | Josep Montolio
1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora

1998

pdf
Building Accurate Semantic Taxonomies from Monolingual MRDs
German Rigau | Horacio Rodriguez | Eneko Agirre
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf
Building Accurate Semantic Taxonomies Monolingual MRDs
German Rigau | Horacio Rodriguez | Eneko Agirre
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

1997

pdf abs
Parsers Optimization for Wide-coverage Unification-based Grammars using the Restriction Technique
Nora La Serna | Arantxa Díaz | Horacio Rodríguez
Proceedings of the Fifth International Workshop on Parsing Technologies

This article describes the methodology we have followed in order to improve the efficiency of a parsing algorithm for wide coverage unification-based grammars. The technique used is the restriction technique (Shieber 85), which has been recognized as an important operation to obtain efficient parsers for unification-based grammars. The main objective of the research is how to choose appropriate restrictors for using the restriction technique. We have developed a statistical model for selecting restrictors. Several experiments have been done in order to characterise those restrictors.