Kepa Bengoetxea


LagunTest: A NLP Based Application to Enhance Reading Comprehension
Kepa Bengoetxea | Itziar Gonzalez-Dios | Amaia Aguirregoitia
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI)

The ability to read and understand written texts plays an important role in education, above all in the last years of primary education. This is especially pertinent in language immersion educational programmes, where some students have low linguistic competence in the languages of instruction. In this context, adapting the texts to the individual needs of each student requires a considerable effort by education professionals. However, language technologies can facilitate the laborious adaptation of materials in order to enhance reading comprehension. In this paper, we present LagunTest, a NLP based application that takes as input a text in Basque or English, and offers synonyms, definitions, examples of the words in different contexts and presents some linguistic characteristics as well as visualizations. LagunTest is based on reusable and open multilingual and multimodal tools, and it is also distributed with an open license. LagunTest is intended to ease the burden of education professionals in the task of adapting materials, and the output should always be supervised by them.


Multilingual segmentation based on neural networks and pre-trained word embeddings
Mikel Iruskieta | Kepa Bengoetxea | Aitziber Atutxa Salazar | Arantza Diaz de Ilarraza
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

The DISPRT 2019 workshop has organized a shared task aiming to identify cross-formalism and multilingual discourse segments. Elementary Discourse Units (EDUs) are quite similar across different theories. Segmentation is the very first stage on the way of rhetorical annotation. Still, each annotation project adopted several decisions with consequences not only on the annotation of the relational discourse structure but also at the segmentation stage. In this shared task, we have employed pre-trained word embeddings, neural networks (BiLSTM+CRF) to perform the segmentation. We report F1 results for 6 languages: Basque (0.853), English (0.919), French (0.907), German (0.913), Portuguese (0.926) and Spanish (0.868 and 0.769). Finally, we also pursued an error analysis based on clause typology for Basque and Spanish, in order to understand the performance of the segmenter.


On WordNet Semantic Classes and Dependency Parsing
Kepa Bengoetxea | Eneko Agirre | Joakim Nivre | Yue Zhang | Koldo Gojenola
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)


Combining Rule-Based and Statistical Syntactic Analyzers
Iakes Goenaga | Koldobika Gojenola | María Jesús Aranzabe | Arantza Díaz de Ilarraza | Kepa Bengoetxea
Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages


Testing the Effect of Morphological Disambiguation in Dependency Parsing of Basque
Kepa Bengoetxea | Arantza Casillas | Koldo Gojenola
Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages

Improving Dependency Parsing with Semantic Classes
Eneko Agirre | Kepa Bengoetxea | Koldo Gojenola | Joakim Nivre
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies


Application of Different Techniques to Dependency Parsing of Basque
Kepa Bengoetxea | Koldo Gojenola
Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages


Exploring Treebank Transformations in Dependency Parsing
Kepa Bengoetxea | Koldo Gojenola
Proceedings of the International Conference RANLP-2009

Application of feature propagation to dependency parsing
Kepa Bengoetxea | Koldo Gojenola
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)