Barbora Hladká

Also published as: Barbora Hladka, B. Hladká

2026

SouDeC: Source Detection and Classification in Czech
Jiří Mírovský | Barbora Hladka
Proceedings of the Fifteenth Language Resources and Evaluation Conference

We present a method of attribution source detection and classification in Czech. A plain text (typically, a newspaper article) enters the SouDec system, gets parsed with the external tool UDPipe into Universal-Dependencies style of sentence representation, and then is analyzed for occurrences of attribution signals and sources. The list of attribution signals has been extracted from a corpus of Czech newspaper articles annotated with interlinked attribution signals and sources, and has been complemented with context and syntax information to help distinguish relevant occurrences of the signals. The SouDec system further classifies the attribution sources in one of five classes: anonymous, partially anonymous, unofficial, official non-political and official political, using information from another external tool, a recognizer and classifier of named entities, NameTag 3. While our source detection method gets results comparable to existing systems for other languages, further improvements can be achieved by incorporating fully-fledged automatic coreference resolution into the classification method. In a focused case study, we test a possible usage of SouDeC for distinguishing domain-specific texts of less vs. more reputable origin.

2022

pdf bib abs

Annotating Attribution in Czech News Server Articles
Barbora Hladka | Jiří Mírovský | Matyáš Kopp | Václav Moravec
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper focuses on detection of sources in the Czech articles published on a news server of Czech public radio. In particular, we search for attribution in sentences and we recognize attributed sources and their sentence context (signals). We organized a crowdsourcing annotation task that resulted in a data set of 2,167 stories with manually recognized signals and sources. In addition, the sources were classified into the classes of named and unnamed sources.

2020

pdf bib abs

Compiling Czech Parliamentary Stenographic Protocols into a Corpus
Barbora Hladka | Matyáš Kopp | Pavel Straňák
Proceedings of the Second ParlaCLARIN Workshop

The Parliament of the Czech Republic consists of two chambers: the Chamber of Deputies (Lower House) and the Senate (Upper House). In our work, we focus on agenda and documents that relate to the Chamber of Deputies exclusively. We pay particular attention to stenographic protocols that record the Chamber of Deputies’ meetings. Our overall goal is to (1) compile the protocols into a ParlaCLARIN TEI encoded corpus, (2) make this corpus accessible and searchable in the TEITOK web-based platform, (3) annotate the corpus using the modules available in TEITOK, e.g. detect and recognize named entities, and (4) highlight the annotations in TEITOK. In addition, we add two more goals that we consider innovative: (5) update the corpus every time a new stenographic protocol is published online by the Chambers of Deputies and (6) expose the annotations as the linked open data in order to improve the protocols’ interoperability with other existing linked open data. This paper is devoted to the goals (1) and (5).

Barbora Hladká

2026

2022

2020

2018

2017

2016

2015

2014

2013

2012

2009

2008

2005

2001

2000

1998

1997

Co-authors

Venues