A LDA-Based Topic Classification Approach From Highly Imperfect Automatic Transcriptions

Mohamed Morchid, Richard Dufour, Georges Linarès


Abstract
Although the current transcription systems could achieve high recognition performance, they still have a lot of difficulties to transcribe speech in very noisy environments. The transcription quality has a direct impact on classification tasks using text features. In this paper, we propose to identify themes of telephone conversation services with the classical Term Frequency-Inverse Document Frequency using Gini purity criteria (TF-IDF-Gini) method and with a Latent Dirichlet Allocation (LDA) approach. These approaches are coupled with a Support Vector Machine (SVM) classification to resolve theme identification problem. Results show the effectiveness of the proposed LDA-based method compared to the classical TF-IDF-Gini approach in the context of highly imperfect automatic transcriptions. Finally, we discuss the impact of discriminative and non-discriminative words extracted by both methods in terms of transcription accuracy.
Anthology ID:
L14-1621
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1309–1314
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/8_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Mohamed Morchid, Richard Dufour, and Georges Linarès. 2014. A LDA-Based Topic Classification Approach From Highly Imperfect Automatic Transcriptions. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 1309–1314, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
A LDA-Based Topic Classification Approach From Highly Imperfect Automatic Transcriptions (Morchid et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/8_Paper.pdf