Maarten Versteegh

2019

pdf abs
Cross-lingual intent classification in a low resource industrial setting
Talaat Khalil | Kornel Kiełczewski | Georgios Christos Chouliaras | Amina Keldibek | Maarten Versteegh
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

This paper explores different approaches to multilingual intent classification in a low resource setting. Recent advances in multilingual text representations promise cross-lingual transfer for classifiers. We investigate the potential for this transfer in an applied industrial setting and compare to multilingual classification using machine translated text. Our results show that while the recently developed methods show promise, practical application calls for a combination of techniques for useful results.

2016

pdf
Learning Text Similarity with Siamese Recurrent Networks
Paul Neculoiu | Maarten Versteegh | Mihai Rotaru
Proceedings of the 1st Workshop on Representation Learning for NLP

2014

pdf abs
Bridging the gap between speech technology and natural language processing: an evaluation toolbox for term discovery systems
Bogdan Ludusan | Maarten Versteegh | Aren Jansen | Guillaume Gravier | Xuan-Nga Cao | Mark Johnson | Emmanuel Dupoux
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The unsupervised discovery of linguistic terms from either continuous phoneme transcriptions or from raw speech has seen an increasing interest in the past years both from a theoretical and a practical standpoint. Yet, there exists no common accepted evaluation method for the systems performing term discovery. Here, we propose such an evaluation toolbox, drawing ideas from both speech technology and natural language processing. We first transform the speech-based output into a symbolic representation and compute five types of evaluation metrics on this representation: the quality of acoustic matching, the quality of the clusters found, and the quality of the alignment with real words (type, token, and boundary scores). We tested our approach on two term discovery systems taking speech as input, and one using symbolic input. The latter was run using both the gold transcription and a transcription obtained from an automatic speech recognizer, in order to simulate the case when only imperfect symbolic information is available. The results obtained are analysed through the use of the proposed evaluation metrics and the implications of these metrics are discussed.

Co-authors

Georgios Christos Chouliaras 1

Amina Keldibek 1