Jean-Pierre Martens


2010

pdf
Improving Proper Name Recognition by Adding Automatically Learned Pronunciation Variants to the Lexicon
Bert Réveil | Jean-Pierre Martens | Henk van den Heuvel
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper deals with the task of large vocabulary proper name recognition. In order to accomodate a wide diversity of possible name pronunciations (due to non-native name origins or speaker tongues) a multilingual acoustic model is combined with a lexicon comprising 3 grapheme-to-phoneme (G2P) transcriptions from G2P transcribers for 3 different languages) and up to 4 so-called phoneme-to-phoneme (P2P) transcriptions. The latter are generated with (speaker tongue, name source) specific P2P converters that try to transform a set of baseline name transcriptions into a pool of transcription variants that lie closer to the `true’ name pronunciations. The experimental results show that the generated P2P variants can be employed to improve name recognition, and that the obtained accuracy is comparable to what is achieved with typical (TY) transcriptions (made by a human expert). Furthermore, it is demonstrated that the P2P conversion can best be instantiated from a baseline transcription in the name source language, and that knowledge of the speaker tongue is an important input as well for the P2P transcription process.

2008

pdf
The AUTONOMATA Spoken Names Corpus
Henk van den Heuvel | Jean-Pierre Martens | Bart D’hoore | Kristof D’hanens | Nanneke Konings
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In the Autonomata project we have collected a corpus of spoken name utterances with manually corrected phonemic transcriptions of these utterances. The corpus was designed with the intention to become a major resource for the development of automatic speech recognition engines that can achieve a high accuracy on the recognition of person and geographical names spoken in Dutch. The recorded names were selected so as to reveal the major pronunciation variations that a speech recognizer of e.g. a navigation system with speech input is going to be confronted with. This includes native speakers speaking foreign names and vice versa.

2006

pdf
Development of a phoneme-to-phoneme (p2p) converter to improve the grapheme-to-phoneme (g2p) conversion of names
Qian Yang | Jean-Pierre Martens | Nanneke Konings | Henk van den Heuvel
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

It is acknowledged that a good phonemic transcription of proper names is imperative for the success of many modern speech-based services such as directory assistance, car navigation, etc. It is also known that state-of-the-art general-purpose grapheme-to-phoneme (g2p) converters perform rather poorly on many name categories. This paper proposes to use a g2p-p2p tandem comprising a state-of-the-art general-purpose g2p converter that produces an initial transcription and a name category specific phoneme-to-phoneme (p2p) converter that aims at correcting the mistakes made by the g2p converter. The main body of the paper describes a novel methodology for the automatic construction of the p2p converter. The methodology is implemented in a software toolbox that will be made publicly available in a form that will permit the user to design a p2p converter for an arbitrary name category. To give a proof of concept, the toolbox was used for the development of three p2p converters for first names, surnames and geographical names respectively. The obtained systems are small (few rules) and effective: significant improvements (up to 50% relative) of the grapheme-to-phoneme conversion are obtained. These encouraging results call for a further development and improvement of the approach.

2004

pdf
The COST278 Pan-European Broadcast News Database
An Vandecatseye | Jean-Pierre Martens | Joao Neto | Hugo Meinedo | Carmen Garcia-Mateo | Javier Dieguez | France Mihelic | Janez Zibert | Jan Nouza | Petr David | Matus Pleva | Anton Cizmar | Harris Papageorgiou | Christina Alexandris
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf
A Spoken Afrikaans Language Resource Designed for Research on Pronunciation Variations
Daan Wissing | Jean-Pierre Martens | Ulrike Janke | Wim Goedertier
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf
Annotation of prominent words, prosodic boundaries and segmental lengthening by non-expert transcribers in the Spoken Dutch Corpus
Jeska Buhmann | Johanneke Caspers | Vincent J. van Heuven | Heleen Hoekstra | Jean-Pierre Martens | Marc Swerts
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf
Word Segmentation in the Spoken Dutch Corpus
Jean-Pierre Martens | Diana Binnenpoorte | Kris Demuynck | Ruben Van Parys | Tom Laureys | Wim Goedertier | Jacques Duchateau
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf
Experiences from the Spoken Dutch Corpus Project
Nelleke Oostdijk | Wim Goedertier | Frank van Eynde | Louis Boves | Jean-Pierre Martens | Michael Moortgat | Harald Baayen
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf
Orthographic Transcription of the Spoken Dutch Corpus
Wim Goedertier | Simo Goddijn | Jean-Pierre Martens
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)