Ruslan Kalitvianski

2020

pdf bib abs
Analyse sémantique de transcriptions automatiques d’appels téléphoniques en français (Semantic analysis of automatic phone call transcriptions in French)
Emmanuelle Dusserre | Ruslan Kalitvianski | Mathieu Ruhlmann | Muntsa Padró
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 4 : Démonstrations et résumés d'articles internationaux

Dans cet article, nous présentons la mise en œuvre d’une chaîne de traitement sémantique complète dédiée aux conversations audio issues de centres d’appel téléphoniques, depuis la phase de transcription automatique jusqu’à l’exploitation des résultats, en passant par l’étape d’analyse sémantique des énoncés. Nous décrivons ici le fonctionnement des différentes analyses que notre équipe développe, ainsi que la plateforme interactive permettant de restituer les résultats agrégés de toutes les conversations analysées.

2018

pdf bib abs
Notre tweet première fois au DEFT-2018 : systèmes de détection de polarité et de transports (Systems for detecting polarity and public transport discussions in French tweets)
David Graceffa | Armelle Ramond | Emmanuelle Dusserre | Ruslan Kalitvianski | Mathieu Ruhlmann | Muntsa Padró
Actes de la Conférence TALN. Volume 2 - Démonstrations, articles des Rencontres Jeunes Chercheurs, ateliers DeFT

Cet article décrit les systèmes de l’équipe Eloquant pour la catégorisation de tweets en français dans les tâches 1 (détection de la thématique transports en commun) et 2 (détection de la polarité globale) du DEFT 2018. Nos systèmes reposent sur un enrichissement sémantique, l’apprentissage automatique et, pour la tâche 1 une approche symbolique. Nous avons effectué deux runs pour chacune des tâches. Nos meilleures F-mesures (0.897 pour la tâche 1 et 0.800 pour la tâche 2) sont au-dessus de la moyenne globale pour chaque tâche, et nous placent dans les 30% supérieurs de tous les runs pour la tâche 2.

2016

pdf bib abs
Learning to Search for Recognizing Named Entities in Twitter
Ioannis Partalas | Cédric Lopez | Nadia Derbas | Ruslan Kalitvianski
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

We presented in this work our participation in the 2nd Named Entity Recognition for Twitter shared task. The task has been cast as a sequence labeling one and we employed a learning to search approach in order to tackle it. We also leveraged LOD for extracting rich contextual features for the named-entities. Our submission achieved F-scores of 46.16 and 60.24 for the classification and the segmentation tasks and ranked 2nd and 3rd respectively. The post-analysis showed that LOD features improved substantially the performance of our system as they counter-balance the lack of context in tweets. The shared task gave us the opportunity to test the performance of NER systems in short and noisy textual data. The results of the participated systems shows that the task is far to be considered as a solved one and methods with stellar performance in normal texts need to be revised.

pdf bib abs
An Aligned French-Chinese corpus of 10K segments from university educational material
Ruslan Kalitvianski | Lingxiao Wang | Valérie Bellynck | Christian Boitet
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)

This paper describes a corpus of nearly 10K French-Chinese aligned segments, produced by post-editing machine translated computer science courseware. This corpus was built from 2013 to 2016 within the PROJECT_NAME project, by native Chinese students. The quality, as judged by native speakers, is ad-equate for understanding (far better than by reading only the original French) and for getting better marks. This corpus is annotated at segment-level by a self-assessed quality score. It has been directly used as supplemental training data to build a statistical machine translation system dedicated to that sublanguage, and can be used to extract the specific bilingual terminology. To our knowledge, it is the first corpus of this kind to be released.