Kepa Sarasola

Also published as: K Sarasola, K. Sarasola


2018

pdf
The ADAPT System Description for the IWSLT 2018 Basque to English Translation Task
Alberto Poncelas | Andy Way | Kepa Sarasola
Proceedings of the 15th International Conference on Spoken Language Translation

In this paper we present the ADAPT system built for the Basque to English Low Resource MT Evaluation Campaign. Basque is a low-resourced, morphologically-rich language. This poses a challenge for Neural Machine Translation models which usually achieve better performance when trained with large sets of data. Accordingly, we used synthetic data to improve the translation quality produced by a model built using only authentic data. Our proposal uses back-translated data to: (a) create new sentences, so the system can be trained with more data; and (b) translate sentences that are close to the test set, so the model can be fine-tuned to the document to be translated.

pdf
Konbitzul: an MWE-specific database for Spanish-Basque
Uxoa Iñurrieta | Itziar Aduriz | Arantza Díaz de Ilarraza | Gorka Labaka | Kepa Sarasola
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Massively multilingual accessible audioguides via cell phones
Itziar Cortes | Igor Leturia | Ińaki Alegria | Aitzol Astigarraga | Kepa Sarasola | Manex Garaio
Proceedings of the 21st Annual Conference of the European Association for Machine Translation

Bidaide1 is a web service that allows the visitors of a museum, route or building to read or listen to explanations relative to the visited place on their own mobile and in their own language. The visitor can access the explanations in various ways: by scanning some QR codes located in the place, by GPS positioning (in outdoor routes), or by automatic Bluetooth proximity activation. This makes it accessible for people with reduced or null vision. On the other hand, this platform also offers to the manager of the visited site the most advanced language resources to create the texts and audios of the explanations in many languages.

2017

pdf
Rule-Based Translation of Spanish Verb-Noun Combinations into Basque
Uxoa Iñurrieta | Itziar Aduriz | Arantza Díaz de Ilarraza | Gorka Labaka | Kepa Sarasola
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

This paper presents a method to improve the translation of Verb-Noun Combinations (VNCs) in a rule-based Machine Translation (MT) system for Spanish-Basque. Linguistic information about a set of VNCs is gathered from the public database Konbitzul, and it is integrated into the MT system, leading to an improvement in BLEU, NIST and TER scores, as well as the results being evidently better according to human evaluators.

2016

pdf
Using Linguistic Data for English and Spanish Verb-Noun Combination Identification
Uxoa Iñurrieta | Arantza Díaz de Ilarraza | Gorka Labaka | Kepa Sarasola | Itziar Aduriz | John Carroll
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We present a linguistic analysis of a set of English and Spanish verb+noun combinations (VNCs), and a method to use this information to improve VNC identification. Firstly, a sample of frequent VNCs are analysed in-depth and tagged along lexico-semantic and morphosyntactic dimensions, obtaining satisfactory inter-annotator agreement scores. Then, a VNC identification experiment is undertaken, where the analysed linguistic data is combined with chunking information and syntactic dependencies. A comparison between the results of the experiment and the results obtained by a basic detection method shows that VNC identification can be greatly improved by using linguistic information, as a large number of additional occurrences are detected with high precision.

pdf
Domain Adaptation in MT Using Titles in Wikipedia as a Parallel Corpus: Resources and Evaluation
Gorka Labaka | Iñaki Alegria | Kepa Sarasola
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents how an state-of-the-art SMT system is enriched by using an extra in-domain parallel corpora extracted from Wikipedia. We collect corpora from parallel titles and from parallel fragments in comparable articles from Wikipedia. We carried out an evaluation with a double objective: evaluating the quality of the extracted data and evaluating the improvement due to the domain-adaptation. We think this can be very useful for languages with limited amount of parallel corpora, where in-domain data is crucial to improve the performance of MT sytems. The experiments on the Spanish-English language pair improve a baseline trained with the Europarl corpus in more than 2 points of BLEU when translating in the Computer Science domain.

2015

pdf bib
Exploiting portability to build an RBMT prototype for a new source language
Nora Aranberri | Gorka Labaka | Arantza Díaz de Ilarraza | Kepa Sarasola
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf
Building hybrid machine translation systems by using an EBMT preprocessor to create partialtranslations
Mikel Artetxe | Gorka Labaka | Kepa Sarasola
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Exploiting portability to build an RBMT prototype for a new source language
Nora Aranberri | Gorka Labaka | Arantza Díaz de Ilarraza | Kepa Sarasola
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Building hybrid machine translation systems by using an EBMT preprocessor to create partial translations
Mikel Artetxe | Gorka Labaka | Kepa Sarasola
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf bib
Comparison of post-editing productivity between professional translators and lay users
Nora Aranberri | Gorka Labaka | Arantza Diaz de Ilarraza | Kepa Sarasola
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas

This work compares the post-editing productivity of professional translators and lay users. We integrate an English to Basque MT system within Bologna Translation Service, an end-to-end translation management platform, and perform a producitivity experiment in a real working environment. Six translators and six lay users translate or post-edit two texts from English into Basque. Results suggest that overall, post-editing increases translation throughput for both translators and users, although the latter seem to benefit more from the MT output. We observe that translators and users perceive MT differently. Additionally, a preliminary analysis seems to suggest that familiarity with the domain, source text complexity and MT quality might affect potential productivity gain.

2012

pdf
Contribution of Complex Lexical Information to Solve Syntactic Ambiguity in Basque
Aitziber Atutxa | Eneko Agirre | Kepa Sarasola
Proceedings of COLING 2012

pdf
Deep evaluation of hybrid architectures: use of different metrics in MERT weight optimization
Cristina España-Bonet | Gorka Labaka | Arantza Díaz de Ilarranza | Lluís Màrquez | Kepa Sarasola
Proceedings of the Third International Workshop on Free/Open-Source Rule-Based Machine Translation

2009

pdf
Reordering on Spanish-Basque SMT
Arantza Díaz de Ilaraza | Gorka Labaka | Kepa Sarasola
Proceedings of Machine Translation Summit XII: Posters

pdf
Use of Rich Linguistic Information to Translate Prepositions and Grammar Cases to Basque
Eneko Agirre | Aitziber Atutxa | Gorka Labaka | Mikel Lersundi | Aingeru Mayor | Kepa Sarasola
Proceedings of the 13th Annual Conference of the European Association for Machine Translation

pdf
Relevance of Different Segmentation Options on Spanish-Basque SMT
Arantza Díaz de Ilarraza | Gorka Labaka | Kepa Sarasola
Proceedings of the 13th Annual Conference of the European Association for Machine Translation

pdf bib
Matxin: developing sustainable machine translation for a less-resourced language
Kepa Sarasola
Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation

2008

pdf bib
Spanish-to-Basque MultiEngine Machine Translation for a Restricted Domain
Iñaki Alegria | Arantza Casillas | Arantza Diaz de Ilarraza | Jon Igartua | Gorka Labaka | Mikel Lersundi | Aingeru Mayor | Kepa Sarasola
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers

We present our initial strategy for Spanish-to-Basque MultiEngine Machine Translation, a language pair with very different structure and word order and with no huge parallel corpus available. This hybrid proposal is based on the combination of three different MT paradigms: Example-Based MT, Statistical MT and Rule- Based MT. We have evaluated the system, reporting automatic evaluation metrics for a corpus in a test domain. The first results obtained are encouraging.

pdf
Strategies for sustainable MT for Basque: incremental design, reusability, standardization and open-source
I. Alegria | X. Arregi | X. Artola | A. Diaz de Ilarraza | G. Labaka | M. Lersundi | A. Mayor | K. Sarasola
Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages

2007

pdf
Comparing rule-based and data-driven approaches to Spanish-to-Basque machine translation
Gorka Labaka | Nicolas Stroppa | Andy Way | Kepa Sarasola
Proceedings of Machine Translation Summit XI: Papers

2006

pdf
Example-Based Machine Translation of the Basque Language
Nicolas Stroppa | Declan Groves | Andy Way | Kepa Sarasola
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers

Basque is both a minority and a highly inflected language with free order of sentence constituents. Machine Translation of Basque is thus both a real need and a test bed for MT techniques. In this paper, we present a modular Data-Driven MT system which includes different chunkers as well as chunk aligners which can deal with the free order of sentence constituents of Basque. We conducted Basque to English translation experiments, evaluated on a large corpus (270,000 sentence pairs). The experimental results show that our system significantly outperforms state-of-the-art approaches according to several common automatic evaluation metrics.

2005

pdf
An open-source shallow-transfer machine translation engine for the Romance languages of Spain
Antonio M. Corbi-Bellot | Mikel L. Forcada | Sergio Ortíz-Rojas | Juan Antonio Pérez-Ortiz | Gema Ramírez-Sánchez | Felipe Sánchez-Martínez | Iñaki Alegria | Aingeru Mayor | Kepa Sarasola
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

pdf bib
An Open Architecture for Transfer-based Machine Translation between Spanish and Basque
Iñaki Alegria | Arantza Diaz de Ilarraza | Gorka Labaka | Mikel Lersundi | Aingeru Mayor | Kepa Sarasola | Mikel L. Forcada | Sergio Ortiz-Rojas | Lluís Padró
Workshop on open-source machine translation

We present the current status of development of an open architecture for the translation from Spanish into Basque. The machine translation architecture uses an open source analyser for Spanish and new modules mainly based on finite-state transducers. The project is integrated in the OpenTrad initiative, a larger government funded project shared among different universities and small companies, which will also include MT engines for translation among the main languages in Spain. The main objective is the construction of an open, reusable and interoperable framework. This paper describes the design of the engine, the formats it uses for the communication among the modules, the modules reused from other project named Matxin and the new modules we are building.

2004

pdf
Exploring Portability of Syntactic Information from English to Basque
Eneko Agirre | Aitziber Atutxa | Koldo Gojenola | Kepa Sarasola
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf
Learning Argument/Adjunct Dictinction for Basque
Izaskun Aldezabal | Maxux Aranzabe | Koldo Gojenola | Kepa Sarasola | Aitziber Atutxa
Proceedings of the ACL-02 Workshop on Unsupervised Lexical Acquisition

pdf
Semiautomatic Labelling of Semantic Features
Arantza Díaz de Ilarraza | Aingeru Mayor | Kepa Sarasola
COLING 2002: The 19th International Conference on Computational Linguistics

2000

pdf bib
Building a lexicon for an English-Basque MT system from heterogeneous wide-coverage dictionaries
Arantxa Diaz de Ilarraza | Aingeru Mayor | Kepa Sarasola
Proceedings of the International Conference on Machine Translation and Multilingual Applications in the new Millennium: MT 2000

pdf
Reusability of wide-coverage linguistic resources in the construction of multilingual technical documentation
Arantxa Diaz de Ilarraza | Aingeru Mayor | Kepa Sarasola
Proceedings of the International Conference on Machine Translation and Multilingual Applications in the new Millennium: MT 2000

pdf bib
A word-grammar based morphological analyzer for agglutinative languages
I. Aduriz | E. Agirre | I. Aldezabal | I. Alegria | X. Arregi | J. M. Arriola | X. Artola | K. Gojenola | A. Maritxalar | K. Sarasola | M. Urkia
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

pdf
A Bootstrapping Approach to Parser Development
Izaskun Aldezabal | Koldo Gojenola | Kepa Sarasola
Proceedings of the Sixth International Workshop on Parsing Technologies

This paper presents a robust parsing system for unrestricted Basque texts. It analyzes a sentence in two stages: a unification-based parser builds basic syntactic units such as NPs, PPs, and sentential complements, while a finite-state parser performs syntactic disambiguation and filtering of the results. The system has been applied to the acquisition of verbal subcategorization information, obtaining 66% recall and 87% precision in the determination of verb subcategorization instances. This information will be later incorporated to the parser, in order to improve its performance.

pdf
A Word-level Morphosyntactic Analyzer for Basque
I. Aduriz | E. Agirre | I. Aldezabal | X. Arregi | J. M. Arriola | X. Artola | K. Gojenola | A. Maritxalar | K. Sarasola | M. Urkia
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1998

pdf
Towards a single proposal in spelling correction
Eneko Agirre | Koldo Gojenola | Kepa Sarasola | Atro Voutilainen
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf
Towards a Single Proposal in Spelling Correction
Eneko Agirre | Koldo Gojenola | Kepa Sarasola | Atro Voutilainen
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

1994

pdf
Lexical, Knowledge Representation in an Intelligent Dictionary Help System
E. Agirre | X. Arregi | X. Artola | A. Diaz de Ilarraza | K. Sarasola
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics

1993

pdf
A Morphological Analysis Based Method for Spelling Correction
I. Aduriz | E. Agirre | I. Alegria | X. Arregi | J.M Arriola | X. Artola | A. Diaz de Ilarraza | N. Ezeiza | M. Maritxalar | K. Sarasola | M. Urkia
Sixth Conference of the European Chapter of the Association for Computational Linguistics

1992

pdf
XUXEN: A Spelling Checker/Corrector for Basque Based on Two-Level Morphology
E. Agirre | I Alegria | X Arregi | X Artola | A Diaz de Ilarraza | M Maritxalar | K Sarasola | M Urkia
Third Conference on Applied Natural Language Processing