Abstract
The paper discusses some fine-tuned models for the tasks of part-of-speech tagging and named entity recognition. The fine-tuning was performed on the basis of an existing BERT pre-trained model and two newly pre-trained BERT models for Bulgarian that are cross-tested on the domain of the Bulgarian part of the ParlaMint corpora as a new domain. In addition, a comparison has been made between the performance of the new fine-tuned BERT models and the available results from the Stanza-based model which the Bulgarian part of the ParlaMint corpora has been annotated with. The observations show the weaknesses in each model as well as the common challenges.- Anthology ID:
- 2024.parlaclarin-1.4
- Volume:
- Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Darja Fiser, Maria Eskevich, David Bordon
- Venues:
- ParlaCLARIN | WS
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 30–35
- Language:
- URL:
- https://aclanthology.org/2024.parlaclarin-1.4
- DOI:
- Cite (ACL):
- Petya Osenova and Kiril Simov. 2024. Bulgarian ParlaMint 4.0 corpus as a testset for Part-of-speech tagging and Named Entity Recognition. In Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024, pages 30–35, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Bulgarian ParlaMint 4.0 corpus as a testset for Part-of-speech tagging and Named Entity Recognition (Osenova & Simov, ParlaCLARIN-WS 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2024.parlaclarin-1.4.pdf