Czech National Corpus in 2020: Recent Developments and Future Outlook

Michal Kren


Abstract
The paper overviews the state of implementation of the Czech National Corpus (CNC) in all the main areas of its operation: corpus compilation, annotation, application development and user services. As the focus is on the recent development, some of the areas are described in more detail than the others. Close attention is paid to the data collection and, in particular, to the description of web application development. This is not only because CNC has recently seen a significant progress in this area, but also because we believe that end-user web applications shape the way linguists and other scholars think about the language data and about the range of possibilities they offer. This consideration is even more important given the variability of the CNC corpora.
Anthology ID:
2020.cmlc-1.8
Volume:
Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Piotr Bański, Adrien Barbaresi, Simon Clematide, Marc Kupietz, Harald Lüngen, Ines Pisetta
Venue:
CMLC
SIG:
Publisher:
European Language Ressources Association
Note:
Pages:
52–57
Language:
English
URL:
https://aclanthology.org/2020.cmlc-1.8
DOI:
Bibkey:
Cite (ACL):
Michal Kren. 2020. Czech National Corpus in 2020: Recent Developments and Future Outlook. In Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora, pages 52–57, Marseille, France. European Language Ressources Association.
Cite (Informal):
Czech National Corpus in 2020: Recent Developments and Future Outlook (Kren, CMLC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2020.cmlc-1.8.pdf