Prague Dependency Treebank - Consolidated 2.0: Enriching a Complex Annotation Scheme

Marie Mikulová, Jiří Mírovský, Milan Straka, Pavlína Synková, Jan Štěpánek, Barbora Štěpánková, Jan Hajič


Abstract
The Prague Dependency Treebank framework is unique in its attempt to systematically include and link different layers of language, including a meaning representation with several types of inter-sentential phenomena, especially coreference and discourse relation. We present its second consolidated version (PDT-C 2.0), which concludes almost 30-years long project of sustained development of the resource to a uniformly and coherently annotated, genre-diversified, almost 4 million token language resource of Czech language, with accompanying fully compatible lexicons. In addition to continuous linguistic research, the richly linguistically annotated corpus is also widely used in international comparisons of the development of traditional and novel NLP tools as well as in conversions into other formalisms. The corpus and the trained parsers are available under the CC BY-NC-SA licence.
Anthology ID:
2026.lrec-main.908
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
11593–11605
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.908/
DOI:
Bibkey:
Cite (ACL):
Marie Mikulová, Jiří Mírovský, Milan Straka, Pavlína Synková, Jan Štěpánek, Barbora Štěpánková, and Jan Hajič. 2026. Prague Dependency Treebank - Consolidated 2.0: Enriching a Complex Annotation Scheme. International Conference on Language Resources and Evaluation, main:11593–11605.
Cite (Informal):
Prague Dependency Treebank - Consolidated 2.0: Enriching a Complex Annotation Scheme (Mikulová et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.908.pdf