Valentin Badea

2024

A Cross-model Study on Learning Romanian Parts of Speech with Transformer Models
Radu Ion | Verginica Barbu Mititelu | Vasile Păiş | Elena Irimia | Valentin Badea
Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024)

This paper will attempt to determine experimentally if POS tagging of unseen words produces comparable performance, in terms of accuracy, as for words that were rarely seen in the training set (i.e. frequency less than 5), or more frequently seen (i.e. frequency greater than 10). To compare accuracies objectively, we will use the odds ratio statistic and its confidence interval testing to show that odds of being correct on unseen words are close to odds of being correct on rarely seen words. For the training of the POS taggers, we use different Romanian BERT models that are freely available on HuggingFace.

2022

pdf bib abs

The paper presents an open-domain Question Answering system for Romanian, answering COVID-19 related questions. The QA system pipeline involves automatic question processing, automatic query generation, web searching for the top 10 most relevant documents and answer extraction using a fine-tuned BERT model for Extractive QA, trained on a COVID-19 data set that we have manually created. The paper will present the QA system and its integration with the Romanian language technologies portal RELATE, the COVID-19 data set and different evaluations of the QA performance.

Co-authors

Maria Mitrofan 1

Venues

clib2

Fix author