Jon Alkorta


2023

pdf
Towards automatic essay scoring of Basque language texts from a rule-based approach based on curriculum-aware systems
Jose Maria Arriola | Mikel Iruskieta | Ekain Arrieta | Jon Alkorta
Proceedings of the NoDaLiDa 2023 Workshop on Constraint Grammar - Methods, Tools and Applications

Although the Basque Education Law mentions that students must finish secondary compulsory education at B2 Basque level and their undergraduate studies at the C1 level, there are no objective tests or tools that can discriminate between these levels. This work presents the first rule-based method to grade written Basque learner texts. We adapt the adult Basque learner curriculum based on the CEFR to create a rule-based grammar for Basque. This paper summarises the results obtained in different classification tasks by combining information formalised through CG3 and different machine learning algorithms used in text classification. Besides, we perform a manual evaluation of the grammar. Finally, we discuss the informa- tiveness of these rules and some ways to further improve assisted text grading and combine rule-based approaches with other approaches based on readability and complexity measures.

2022

pdf
Adding the Basque Parliament Corpus to ParlaMint Project
Jon Alkorta | Mikel Iruskieta Quintian
Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference

The aim of this work is to describe the colection created with transcript of the Basque parliamentary speeches. This corpus follows the constraints of the ParlaMint project. The Basque ParlaMint corpus consists of two versions: the first version stands for what was said in the Basque Parliament, that is, the original bilingual corpus in Basque and in Spanish to analyse what and how was said, while the second is only in Basque with the original and translated passages to promote studies on the content of the parliament speeches.

2020

pdf
Exploring the Enrichment of Basque WordNet with a Sentiment Lexicon
Itziar Gonzalez-Dios | Jon Alkorta
Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMW2020)

Wordnets are lexical databases where the semantic relations of words and concepts are established. These resources are useful for manyNLP tasks, such as automatic text classification, word-sense disambiguation or machine translation. In comparison with other wordnets,the Basque version is smaller and some PoS are underrepresented or missing e.g. adjectives and adverbs. In this work, we explore anovel approach to enrich the Basque WordNet, focusing on the adjectives. We want to prove the use and and effectiveness of sentimentlexicons to enrich the resource without the need of starting from scratch. Using as complementary resources, one dictionary and thesentiment valences of the words, we check if the word of the lexicon matches with the meaning of the synset, and if it matches we addthe word as variant to the Basque WordNet. Following this methodology, we describe the most frequent adjectives with positive andnegative valence, the matches and the possible solutions for the non-matches.

2019

pdf
Towards discourse annotation and sentiment analysis of the Basque Opinion Corpus
Jon Alkorta | Koldo Gojenola | Mikel Iruskieta
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

Discourse information is crucial for a better understanding of the text structure and it is also necessary to describe which part of an opinionated text is more relevant or to decide how a text span can change the polarity (strengthen or weaken) of other span by means of coherence relations. This work presents the first results on the annotation of the Basque Opinion Corpus using Rhetorical Structure Theory (RST). Our evaluation results and analysis show us the main avenues to improve on a future annotation process. We have also extracted the subjectivity of several rhetorical relations and the results show the effect of sentiment words in relations and the influence of each relation in the semantic orientation value.

2018

pdf
Saying no but meaning yes: negation and sentiment analysis in Basque
Jon Alkorta | Koldo Gojenola | Mikel Iruskieta
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

In this work, we have analyzed the effects of negation on the semantic orientation in Basque. The analysis shows that negation markers can strengthen, weaken or have no effect on sentiment orientation of a word or a group of words. Using the Constraint Grammar formalism, we have designed and evaluated a set of linguistic rules to formalize these three phenomena. The results show that two phenomena, strengthening and no change, have been identified accurately and the third one, weakening, with acceptable results.

2017

pdf
Using lexical level information in discourse structures for Basque sentiment analysis
Jon Alkorta | Koldo Gojenola | Mikel Iruskieta | Maite Taboada
Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms