2024
pdf
abs
Speech Technology Services for Oral History Research
Christoph Draxler
|
Henk van den Heuvel
|
Arjan van Hessen
|
Pavel Ircing
|
Jan Lehečka
Proceedings of the First Workshop on Holocaust Testimonies as Language Resources (HTRes) @ LREC-COLING 2024
Oral history is about oral sources of witnesses and commentors on historical events. Speech technology is an important instrument to process such recordings in order to obtain transcription and further enhancements to structure the oral account In this contribution we address the transcription portal and the webservices associated with speech processing at BAS, speech solutions developed at LINDAT, how to do it yourself with Whisper, remaining challenges, and future developments.
2018
pdf
Towards Processing of the Oral History Interviews and Related Printed Documents
Zbyněk Zajíc
|
Lucie Skorkovská
|
Petr Neduchal
|
Pavel Ircing
|
Josef V. Psutka
|
Marek Hrúz
|
Aleš Pražák
|
Daniel Soutner
|
Jan Švec
|
Lukáš Bureš
|
Luděk Müller
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
pdf
Design and Development of Speech Corpora for Air Traffic Control Training
Luboš Šmídl
|
Jan Švec
|
Daniel Tihelka
|
Jindřich Matoušek
|
Jan Romportl
|
Pavel Ircing
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2017
pdf
abs
Combining Textual and Speech Features in the NLI Task Using State-of-the-Art Machine Learning Techniques
Pavel Ircing
|
Jan Švec
|
Zbyněk Zajíc
|
Barbora Hladká
|
Martin Holub
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications
We summarize the involvement of our CEMI team in the ”NLI Shared Task 2017”, which deals with both textual and speech input data. We submitted the results achieved by using three different system architectures; each of them combines multiple supervised learning models trained on various feature sets. As expected, better results are achieved with the systems that use both the textual data and the spoken responses. Combining the input data of two different modalities led to a rather dramatic improvement in classification performance. Our best performing method is based on a set of feed-forward neural networks whose hidden-layer outputs are combined together using a softmax layer. We achieved a macro-averaged F1 score of 0.9257 on the evaluation (unseen) test set and our team placed first in the main task together with other three teams.
2008
pdf
abs
Dialogue, Speech and Images: the Companions Project Data Set
Yorick Wilks
|
David Benyon
|
Christopher Brewster
|
Pavel Ircing
|
Oli Mival
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
This paper describes part of the corpus collection efforts underway in the EC funded Companions project. The Companions project is collecting substantial quantities of dialogue a large part of which focus on reminiscing about photographs. The texts are in English and Czech. We describe the context and objectives for which this dialogue corpus is being collected, the methodology being used and make observations on the resulting data. The corpora will be made available to the wider research community through the Companions Project web site.
2006
pdf
abs
Exploiting Linguistic Knowledge in Language Modeling of Czech Spontaneous Speech
Pavel Ircing
|
Jan Hoidekr
|
Josef Psutka
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
In our paper, we present a method for incorporating available linguistic information into a statistical language model that is used in ASR system for transcribing spontaneous speech. We employ the class-based language model paradigm and use the morphological tags as the basis for world-to-class mapping. Since the number of different tags is at least by one order of magnitude lower than the number of words even in the tasks with moderately-sized vocabularies, the tag-based model can be rather robustly estimated using even the relatively small text corpora. Unfortunately, this robustness goes hand in hand with restricted predictive ability of the class-based model. Hence we apply the two-pass recognition strategy, where the first pass is performed with the standard word-based n-gram and the resulting lattices are rescored in the second pass using the aforementioned class-based model. Using this decoding scenario, we have managed to moderately improve the word error rate in the performed ASR experiments.
2004
pdf
Issues in Annotation of the Czech Spontaneous Speech Corpus in the MALACH project
Josef Psutka
|
Pavel Ircing
|
Jan Hajič
|
Vlasta Radová
|
Josef V. Psutka
|
William J. Byrne
|
Samuel Gustman
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)