Xuan-Nga Cao

Also published as: Xuân-Nga Cao, Xuân-Nga Cao Kam


2020

pdf
Seshat: a Tool for Managing and Verifying Annotation Campaigns of Audio Data
Hadrien Titeux | Rachid Riad | Xuan-Nga Cao | Nicolas Hamilakis | Kris Madden | Alejandrina Cristia | Anne-Catherine Bachoud-Lévi | Emmanuel Dupoux
Proceedings of the Twelfth Language Resources and Evaluation Conference

We introduce Seshat, a new, simple and open-source software to efficiently manage annotations of speech corpora. The Seshat software allows users to easily customise and manage annotations of large audio corpora while ensuring compliance with the formatting and naming conventions of the annotated output files. In addition, it includes procedures for checking the content of annotations following specific rules that can be implemented in personalised parsers. Finally, we propose a double-annotation mode, for which Seshat computes automatically an associated inter-annotator agreement with the gamma measure taking into account the categorisation and segmentation discrepancies.

2018

pdf
BabyCloud, a Technological Platform for Parents and Researchers
Xuân-Nga Cao | Cyrille Dakhlia | Patricia Del Carmen | Mohamed-Amine Jaouani | Malik Ould-Arbi | Emmanuel Dupoux
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2014

pdf
Bridging the gap between speech technology and natural language processing: an evaluation toolbox for term discovery systems
Bogdan Ludusan | Maarten Versteegh | Aren Jansen | Guillaume Gravier | Xuan-Nga Cao | Mark Johnson | Emmanuel Dupoux
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The unsupervised discovery of linguistic terms from either continuous phoneme transcriptions or from raw speech has seen an increasing interest in the past years both from a theoretical and a practical standpoint. Yet, there exists no common accepted evaluation method for the systems performing term discovery. Here, we propose such an evaluation toolbox, drawing ideas from both speech technology and natural language processing. We first transform the speech-based output into a symbolic representation and compute five types of evaluation metrics on this representation: the quality of acoustic matching, the quality of the clusters found, and the quality of the alignment with real words (type, token, and boundary scores). We tested our approach on two term discovery systems taking speech as input, and one using symbolic input. The latter was run using both the gold transcription and a transcription obtained from an automatic speech recognizer, in order to simulate the case when only imperfect symbolic information is available. The results obtained are analysed through the use of the proposed evaluation metrics and the implications of these metrics are discussed.

2005

pdf
Statistics vs. UG in Language Acquisition: Does a Bigram Analysis Predict Auxiliary Inversion?
Xuân-Nga Cao Kam | Iglika Stoyneshka | Lidiya Tornyova | William Gregory Sakas | Janet Dean Fodor
Proceedings of the Workshop on Psychocomputational Models of Human Language Acquisition