Bulgarian ParlaMint 4.0 corpus as a testset for Part-of-speech tagging and Named Entity Recognition

Petya Osenova; Kiril Simov

Bulgarian ParlaMint 4.0 corpus as a testset for Part-of-speech tagging and Named Entity Recognition

Abstract

The paper discusses some fine-tuned models for the tasks of part-of-speech tagging and named entity recognition. The fine-tuning was performed on the basis of an existing BERT pre-trained model and two newly pre-trained BERT models for Bulgarian that are cross-tested on the domain of the Bulgarian part of the ParlaMint corpora as a new domain. In addition, a comparison has been made between the performance of the new fine-tuned BERT models and the available results from the Stanza-based model which the Bulgarian part of the ParlaMint corpora has been annotated with. The observations show the weaknesses in each model as well as the common challenges.

Anthology ID:: 2024.parlaclarin-1.4
Volume:: Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Darja Fiser, Maria Eskevich, David Bordon
Venues:: ParlaCLARIN | WS
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 30–35
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.parlaclarin-1.4/
DOI:
Bibkey:
Cite (ACL):: Petya Osenova and Kiril Simov. 2024. Bulgarian ParlaMint 4.0 corpus as a testset for Part-of-speech tagging and Named Entity Recognition. In Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024, pages 30–35, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Bulgarian ParlaMint 4.0 corpus as a testset for Part-of-speech tagging and Named Entity Recognition (Osenova & Simov, ParlaCLARIN 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.parlaclarin-1.4.pdf

PDF Cite Search Fix data