Ruben Urizar

Also published as: R. Urizar, Rubén Urizar


2018

pdf bib
Verbal Multiword Expressions in Basque Corpora
Uxoa Iñurrieta | Itziar Aduriz | Ainara Estarrona | Itziar Gonzalez-Dios | Antton Gurrutxaga | Ruben Urizar | Iñaki Alegria
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

This paper presents a Basque corpus where Verbal Multiword Expressions (VMWEs) were annotated following universal guidelines. Information on the annotation is given, and some ideas for discussion upon the guidelines are also proposed. The corpus is useful not only for NLP-related research, but also to draw conclusions on Basque phraseology in comparison with other languages.

2016

pdf bib
MEANTIME, the NewsReader Multilingual Event and Time Corpus
Anne-Lyse Minard | Manuela Speranza | Ruben Urizar | Begoña Altuna | Marieke van Erp | Anneleen Schoen | Chantal van Son
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, we present the NewsReader MEANTIME corpus, a semantically annotated corpus of Wikinews articles. The corpus consists of 480 news articles, i.e. 120 English news articles and their translations in Spanish, Italian, and Dutch. MEANTIME contains annotations at different levels. The document-level annotation includes markables (e.g. entity mentions, event mentions, time expressions, and numerical expressions), relations between markables (modeling, for example, temporal information and semantic role labeling), and entity and event intra-document coreference. The corpus-level annotation includes entity and event cross-document coreference. Semantic annotation on the English section was performed manually; for the annotation in Italian, Spanish, and (partially) Dutch, a procedure was devised to automatically project the annotations on the English texts onto the translated texts, based on the manual alignment of the annotated elements; this enabled us not only to speed up the annotation process but also provided cross-lingual coreference. The English section of the corpus was extended with timeline annotations for the SemEval 2015 TimeLine shared task. The “First CLIN Dutch Shared Task” at CLIN26 was based on the Dutch section, while the EVALITA 2016 FactA (Event Factuality Annotation) shared task, based on the Italian section, is currently being organized.

2015

pdf bib
SemEval-2015 Task 4: TimeLine: Cross-Document Event Ordering
Anne-Lyse Minard | Manuela Speranza | Eneko Agirre | Itziar Aldabe | Marieke van Erp | Bernardo Magnini | German Rigau | Rubén Urizar
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2010

pdf bib
A Morphological Processor Based on Foma for Biscayan (a Basque dialect)
Iñaki Alegria | Garbiñe Aranbarri | Klara Ceberio | Gorka Labaka | Bittor Laskurain | Ruben Urizar
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present a new morphological processor for Biscayan, a dialect of Basque, developed on the description of the morphology of standard Basque. The database for the standard morphology has been extended for dialects and an open-source tool for morphological description named foma is used for building the processor. Biscayan is a dialect of the Basque language spoken mainly in Biscay, a province on the western of the Basque Country. The description of the lexicon and the morphotactics (or word grammar) for the standard Basque was carried out using a relational database and the database has been extended in order to include dialectal variants linked to the standard entries. XuxenB, a spelling checker/corrector for this dialect, is the first application of this work. Additionally to the basic analyzer used for spelling, a new transducer is included. It is an enhanced analyzer for linking standard form with the corresponding standard ones. It is used in correction for generation of proposals when in the input text appear standard forms which we want to replace with dialectal forms.

2004

pdf bib
Representation and Treatment of Multiword Expressions in Basque
Iñaki Alegria | Olatz Ansa | Xabier Artola | Nerea Ezeiza | Koldo Gojenola | Ruben Urizar
Proceedings of the Workshop on Multiword Expressions: Integrating Processing

pdf bib
A XML-Based Term Extraction Tool for Basque
I. Alegria | A. Gurrutxaga | P. Lizaso | X. Saralegi | S. Ugartetxea | R. Urizar
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

This project combines linguistic and statistical information to develop a term extraction tool for Basque. Being Basque an agglutinative and highly inflected language, the treatment of morphosyntactic information is vital. In addition, due to late unification process of the language, texts present more elevated term dispersion than in a highly normalized language. The result is a semi-automatic terminology extraction tool based on XML, for its use in technical and scientific information managing.

1998

pdf bib
Combining Stochastic and Rule-Based Methods for Disambiguation in Agglutinative Languages
N. Ezeiza | I. Alegria | J.M. Arriola | R. Urizar | I. Aduriz
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf bib
Combining Stochastic and Rule-Based Methods for Disambiguation in Agglutinative Languages
N. Ezeiza | I. Alegria | J.M. Arriola | R. Urizar | I. Aduriz
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf bib
Combining Stochastic and Rule-Based Methods for Disambiguation in Agglutinative Languages
N. Ezeiza | I. Alegria | J.M. Arriola | R. Urizar | I. Aduriz
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics