Czech National Corpus in 2020: Recent Developments and Future Outlook

Michal Křen

Czech National Corpus in 2020: Recent Developments and Future Outlook

Abstract

The paper overviews the state of implementation of the Czech National Corpus (CNC) in all the main areas of its operation: corpus compilation, annotation, application development and user services. As the focus is on the recent development, some of the areas are described in more detail than the others. Close attention is paid to the data collection and, in particular, to the description of web application development. This is not only because CNC has recently seen a significant progress in this area, but also because we believe that end-user web applications shape the way linguists and other scholars think about the language data and about the range of possibilities they offer. This consideration is even more important given the variability of the CNC corpora.

Anthology ID:: 2020.cmlc-1.8
Volume:: Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Piotr Bański, Adrien Barbaresi, Simon Clematide, Marc Kupietz, Harald Lüngen, Ines Pisetta
Venue:: CMLC
SIG:
Publisher:: European Language Ressources Association
Note:
Pages:: 52–57
Language:: English
URL:: https://aclanthology.org/2020.cmlc-1.8
DOI:
Bibkey:
Cite (ACL):: Michal Kren. 2020. Czech National Corpus in 2020: Recent Developments and Future Outlook. In Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora, pages 52–57, Marseille, France. European Language Ressources Association.
Cite (Informal):: Czech National Corpus in 2020: Recent Developments and Future Outlook (Kren, CMLC 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-4/2020.cmlc-1.8.pdf

PDF Search