Jan Švec


Towards Processing of the Oral History Interviews and Related Printed Documents
Zbyněk Zajíc | Lucie Skorkovská | Petr Neduchal | Pavel Ircing | Josef V. Psutka | Marek Hrúz | Aleš Pražák | Daniel Soutner | Jan Švec | Lukáš Bureš | Luděk Müller
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Design and Development of Speech Corpora for Air Traffic Control Training
Luboš Šmídl | Jan Švec | Daniel Tihelka | Jindřich Matoušek | Jan Romportl | Pavel Ircing
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)


Combining Textual and Speech Features in the NLI Task Using State-of-the-Art Machine Learning Techniques
Pavel Ircing | Jan Švec | Zbyněk Zajíc | Barbora Hladká | Martin Holub
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

We summarize the involvement of our CEMI team in the ”NLI Shared Task 2017”, which deals with both textual and speech input data. We submitted the results achieved by using three different system architectures; each of them combines multiple supervised learning models trained on various feature sets. As expected, better results are achieved with the systems that use both the textual data and the spoken responses. Combining the input data of two different modalities led to a rather dramatic improvement in classification performance. Our best performing method is based on a set of feed-forward neural networks whose hidden-layer outputs are combined together using a softmax layer. We achieved a macro-averaged F1 score of 0.9257 on the evaluation (unseen) test set and our team placed first in the main task together with other three teams.


Structural Metadata Annotation of Speech Corpora: Comparing Broadcast News and Broadcast Conversations
Jáchym Kolář | Jan Švec
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Structural metadata extraction (MDE) research aims to develop techniques for automatic conversion of raw speech recognition output to forms that are more useful to humans and to downstream automatic processes. It may be achieved by inserting boundaries of syntactic/semantic units to the flow of speech, labeling non-content words like filled pauses and discourse markers for optional removal, and identifying sections of disfluent speech. This paper compares two Czech MDE speech corpora, one in the domain of broadcast news and the other in the domain of broadcast conversations. A variety of statistics about fillers, edit disfluencies, and syntactic/semantic units are presented. In addition, it is reported that disfluent portions of speech show differences in the distribution of parts of speech (POS) of their content in comparison with the general POS distribution. The two Czech corpora are not only compared with each other, but also with available numbers relating to English MDE corpora of broadcast news and telephone conversations.