Martin Holub


Combining Textual and Speech Features in the NLI Task Using State-of-the-Art Machine Learning Techniques
Pavel Ircing | Jan Švec | Zbyněk Zajíc | Barbora Hladká | Martin Holub
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

We summarize the involvement of our CEMI team in the ”NLI Shared Task 2017”, which deals with both textual and speech input data. We submitted the results achieved by using three different system architectures; each of them combines multiple supervised learning models trained on various feature sets. As expected, better results are achieved with the systems that use both the textual data and the spoken responses. Combining the input data of two different modalities led to a rather dramatic improvement in classification performance. Our best performing method is based on a set of feed-forward neural networks whose hidden-layer outputs are combined together using a softmax layer. We achieved a macro-averaged F1 score of 0.9257 on the evaluation (unseen) test set and our team placed first in the main task together with other three teams.


Verb sense disambiguation in Machine Translation
Roman Sudarikov | Ondřej Dušek | Martin Holub | Ondřej Bojar | Vincent Kríž
Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6)

We describe experiments in Machine Translation using word sense disambiguation (WSD) information. This work focuses on WSD in verbs, based on two different approaches – verbal patterns based on corpus pattern analysis and verbal word senses from valency frames. We evaluate several options of using verb senses in the source-language sentences as an additional factor for the Moses statistical machine translation system. Our results show a statistically significant translation quality improvement in terms of the BLEU metric for the valency frames approach, but in manual evaluation, both WSD methods bring improvements.


Feature Extraction for Native Language Identification Using Language Modeling
Vincent Kríž | Martin Holub | Pavel Pecina
Proceedings of the International Conference Recent Advances in Natural Language Processing


Feature Engineering in the NLI Shared Task 2013: Charles University Submission Report
Barbora Hladká | Martin Holub | Vincent Kríž
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

Rule-Based Extraction of English Verb Collocates from a Dependency-Parsed Corpus
Silvie Cinková | Martin Holub | Ema Krejčová | Lenka Smejkalová
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)


Managing Uncertainty in Semantic Tagging
Silvie Cinková | Martin Holub | Vincent Kríž
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

Tailored Feature Extraction for Lexical Disambiguation of English Verbs Based on Corpus Pattern Analysis
Martin Holub | Vincent Kríž | Silvie Cinková | Eckhard Bick
Proceedings of COLING 2012

A database of semantic clusters of verb usages
Silvie Cinková | Martin Holub | Adam Rambousek | Lenka Smejkalová
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We are presenting VPS-30-En, a small lexical resource that contains the following 30 English verbs: access, ally, arrive, breathe, claim, cool, crush, cry, deny, enlarge, enlist, forge, furnish, hail, halt, part, plough, plug, pour, say, smash, smell, steer, submit, swell, tell, throw, trouble, wake and yield. We have created and have been using VPS-30-En to explore the interannotator agreement potential of the Corpus Pattern Analysis. VPS-30-En is a small snapshot of the Pattern Dictionary of English Verbs (Hanks and Pustejovsky, 2005), which we revised (both the entries and the annotated concordances) and enhanced with additional annotations. It is freely available at In this paper, we compare the annotation scheme of VPS-30-En with the original PDEV. We also describe the adjustments we have made and their motivation, as well as the most pervasive causes of interannotator disagreements.


Large Scale Experiments for Semantic Labeling of Noun Phrases in Raw Text
Louise Guthrie | Roberto Basili | Fabio Zanzotto | Kalina Bontcheva | Hamish Cunningham | David Guthrie | Jia Cui | Marco Cammisa | Jerry Cheng-Chieh Liu | Cassia Farria Martin | Kristiyan Haralambiev | Martin Holub | Klaus Macherey | Fredrick Jelinek
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Searching for Topics in a Large Collection of Texts
Martin Holub | Jiří Semecký | Jiří Diviš
Proceedings of the ACL Student Research Workshop


Use of Dependency Tree Structures for the Microcontext Extraction
Martin Holub | Alena Bohmova
ACL-2000 Workshop on Recent Advances in Natural Language Processing and Information Retrieval