Irene Castellón

Also published as: Irene Castellon


Enhancing FreeLing Rule-Based Dependency Grammars with Subcategorization Frames
Marina Lloberes | Irene Castellón | Lluís Padró
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

Suitability of ParTes Test Suite for Parsing Evaluation
Marina Lloberes | Irene Castellón | Lluís Padró
Proceedings of the 14th International Conference on Parsing Technologies


VERTa: Facing a Multilingual Experience of a Linguistically-based MT Evaluation
Elisabet Comelles | Jordi Atserias | Victoria Arranz | Irene Castellón | Jordi Sesé
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

There are several MT metrics used to evaluate translation into Spanish, although most of them use partial or little linguistic information. In this paper we present the multilingual capability of VERTa, an automatic MT metric that combines linguistic information at lexical, morphological, syntactic and semantic level. In the experiments conducted we aim at identifying those linguistic features that prove the most effective to evaluate adequacy in Spanish segments. This linguistic information is tested both as independent modules (to observe what each type of feature provides) and in a combinatory fastion (where different kinds of information interact with each other). This allows us to extract the optimal combination. In addition we compare these linguistic features to those used in previous versions of VERTa aimed at evaluating adequacy for English segments. Finally, experiments show that VERTa can be easily adapted to other languages than English and that its collaborative approach correlates better with human judgements on adequacy than other well-known metrics.


VERTa: Linguistic features in MT evaluation
Elisabet Comelles | Jordi Atserias | Victoria Arranz | Irene Castellón
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In the last decades, a wide range of automatic metrics that use linguistic knowledge has been developed. Some of them are based on lexical information, such as METEOR; others rely on the use of syntax, either using constituent or dependency analysis; and others use semantic information, such as Named Entities and semantic roles. All these metrics work at a specific linguistic level, but some researchers have tried to combine linguistic information, either by combining several metrics following a machine-learning approach or focusing on the combination of a wide variety of metrics in a simple and straightforward way. However, little research has been conducted on how to combine linguistic features from a linguistic point of view. In this paper we present VERTa, a metric which aims at using and combining a wide variety of linguistic features at lexical, morphological, syntactic and semantic level. We provide a description of the metric and report some preliminary experiments which will help us to discuss the use and combination of certain linguistic features in order to improve the metric performance


Document-Level Automatic MT Evaluation based on Discourse Representations
Elisabet Comelles | Jesús Giménez | Lluís Màrquez | Irene Castellón | Victoria Arranz
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
FreeLing 2.1: Five Years of Open-source Language Processing Tools
Lluís Padró | Miquel Collado | Samuel Reese | Marina Lloberes | Irene Castellón
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

FreeLing is an open-source multilingual language processing library providing a wide range of language analyzers for several languages. It offers text processing and language annotation facilities to natural language processing application developers, simplifying the task of building those applications. FreeLing is customizable and extensible. Developers can use the default linguistic resources (dictionaries, lexicons, grammars, etc.) directly, or extend them, adapt them to specific domains, or even develop new ones for specific languages. This paper overviews the recent history of this tool, summarizes the improvements and extensions incorporated in the latest version, and depicts the architecture of the library. Special focus is brought to the fact and consequences of the library being open-source: After five years and over 35,000 downloads, a growing user community has extended the initial threelanguages (English, Spanish and Catalan) to eight (adding Galician, Italian, Welsh, Portuguese, and Asturian), proving that the collaborative open model is a productive approach for the development of NLP tools and resources.

Spanish FreeLing Dependency Grammar
Marina Lloberes | Irene Castellón | Lluís Padró
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents the development of an open-source Spanish Dependency Grammar implemented in FreeLing environment. This grammar was designed as a resource for NLP applications that require a step further in natural language automatic analysis, as is the case of Spanish-to-Basque translation. The development of wide-coverage rule-based grammars using linguistic knowledge contributes to extend the existing Spanish deep parsers collection, which sometimes is limited. Spanish FreeLing Dependency Grammar, named EsTxala, provides deep and robust parse trees, solving attachments for any structure and assigning syntactic functions to dependencies. These steps are dealt with hand-written rules based on linguistic knowledge. As a result, FreeLing Dependency Parser gives a unique analysis as a dependency tree for each sentence analyzed. Since it is a resource open to the scientific community, exhaustive grammar evaluation is being done to determine its accuracy as well as strategies for its manteinance and improvement. In this paper, we show the results of an experimental evaluation carried out over EsTxala in order to test our evaluation methodology.


Towards Spanish Verbs’ Selectional Preferences Automatic Acquisition: Semantic Annotation of the SenSem Corpus
Jordi Carrera | Irene Castellón | Salvador Climent | Marta Coll-Florit
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present the results of an agreement task carried out in the framework of the KNOW Project and consisting in manually annotating an agreement sample totaling 50 sentences extracted from the SenSem corpus. Diambiguation was carried out for all nouns, proper nouns and adjectives in the sample, all of which were assigned EuroWordNet (EWN) synsets. As a result of the task, Spanish WN has been shown to exhibit 1) lack of explanatory clarity (it does not define word meanings, but glosses and examplifies them instead; it does not systematically encode metaphoric meanings, either); 2) structural inadequacy (some words appear as hyponyms of another sense of the same word; sometimes there even coexist in Spanish WN a general sense and a specific one related to the same concept, but with no structural link in between; hyperonymy relationships have been detected that are likely to raise doubts to human annotators; there can even be found cases of auto-hyponymy); 3) cross-linguistic inconsistency (there exist in English EWN concepts whose lexical equivalent is missing in Spanish WN; glosses in one language more often than not contradict or diverge from glosses in another language).


The Sensem Corpus: a Corpus Annotated at the Syntactic and Semantic Level
Irene Castellón | Ana Fernández-Montraveta | Gloria Vázquez | Laura Alonso Alemany | Joan Antoni Capilla
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The primary aim of the project SENSEM (Sentence Semantics, BFF2003-06456) is the construction of a Lexical Data Base illustrating the syntactic and semantic behavior of each of the senses of the 250 most frequent verbs of Spanish. With this objective in mind, we are currently building an annotated corpus consisting of sentences extracted from the electronic version of the newspaper El Periódico de Catalunya, totalling approximately 1 million words, with 100 examples of each verb. By the time of the conference, we will be about to complete the annotation of 25,000 sentences, which means roughly a corpus of 800,000 words. Approximately 400,000 of them will have been revised. We expect to make the corpus publicly available by the end of 2006.


Multiple Sequence Alignment for Characterizing the Lineal Structure of Revision
Laura Alonso | Irene Castellón | Jordi Escribano | Xavier Messeguer | Lluís Padró
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

We present a first approach to the application of a data mining technique, Multiple Sequence Alignment, to the systematization of a polemic aspect of discourse, namely, the expression of contrast, concession, counterargument and semantically similar discursive relations. The representation of the phenomena under study is carried out by very simple techniques, mostly pattern-matching, but the results allow to drive insightful conclusions on the organization of this aspect of discourse: equivalence classes of discourse markers are established, and systematic patterns are discovered, which will be applied in enhancing a discursive parser.

Semantic Categorization of Spanish Se-constructions
Glòria Vázquez | Ana Fernández Montraveta | Irene Castellón | Laura Alonso
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Knowledge intensive e-mail summarization in CARPANTA
Laura Alonso | Irene Castellón | Bernardino Casas | Lluís Padró
Proceedings of the ACL Interactive Poster and Demonstration Sessions


Automatic Lexical Acquisition from Raw Corpora: An Application to Russian
Antoni Oliver | Irene Castellón | Lluís Màrquez
Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages


On the concept of diathesis alternations as semantic oppositions
Ana Fernandez | M. Antonia Marti | Gloria Vazquez | Irene Castellon
SIGLEX99: Standardizing Lexical Resources


pdf bib
Spanish EuroWordNet and LCS-based interlingual MT
Bonnie J. Dorr | M. Antonia Martí | Irene Castellón
AMTA/SIG-IL First Workshop on Interlinguas

We present a machine translation framework in which the interlingua— Lexical Conceptual Structure (LCS)—is coupled with a definitional component that includes bilingual (EuroWordNet) links between words in the source and target languages. While the links between individual words are language-specific, the LCS is designed to be a language-independent, compositional representation. We take the view that the two types of information—shallower, transfer-like knowledge as well as deeper, compositional knowledge—can be reconciled in interlingual machine translation, the former for overcoming the intractability of LCS-based lexical selec- tion, and the latter for relating the underlying semantics of two words cross-linguistically. We describe the acquisition process for these two information types and present results of hand-verification of the acquired lexicon. Finally, we demonstrate the utility of the two information types in interlingual MT.


SEISD: An environment for extraction of Semantic Information from on-line dictionaries
Alicia Ageno | Irene Castellon | M. A. Marti | German Rigau | Francesc Ribas | Horacio Rodriguez | Mariona Taule | Felisa Verdejo
Third Conference on Applied Natural Language Processing