Stéphane Rauzy

Also published as: Stephane Rauzy

2020

pdf bib abs
Two-level classification for dialogue act recognition in task-oriented dialogues
Philippe Blache | Massina Abderrahmane | Stéphane Rauzy | Magalie Ochs | Houda Oufaida
Proceedings of the 28th International Conference on Computational Linguistics

Dialogue act classification becomes a complex task when dealing with fine-grain labels. Many applications require such level of labelling, typically automatic dialogue systems. We present in this paper a 2-level classification technique, distinguishing between generic and specific dialogue acts (DA). This approach makes it possible to benefit from the very good accuracy of generic DA classification at the first level and proposes an efficient approach for specific DA, based on high-level linguistic features. Our results show the interest of involving such features into the classifiers, outperforming all other feature sets, in particular those classically used in DA classification.

pdf bib abs
PACO: a Corpus to Analyze the Impact of Common Ground in Spontaneous Face-to-Face Interaction
Mary Amoyal | Béatrice Priego-Valverde | Stephane Rauzy
Proceedings of the 12th Language Resources and Evaluation Conference

PAC0 is a French audio-video conversational corpus made of 15 face-to-face dyadic interactions, lasting around 20 min each. This compared corpus has been created in order to explore the impact of the lack of personal common ground (Clark, 1996) on participants collaboration during conversation and specifically on their smile during topic transitions. We have constituted this conversational corpus " PACO” by replicating the experimental protocol of “Cheese!” (Priego-valverde & al.,2018). The only difference that distinguishes these two corpora is the degree of CG of the interlocutors: in Cheese! interlocutors are friends, while in PACO they do not know each other. This experimental protocol allows to analyze how the participants are getting acquainted. This study brings two main contributions. First, the PACO conversational corpus enables to compare the impact of the interlocutors’ common ground. Second, the semi-automatic smile annotation protocol allows to obtain reliable and reproducible smile annotations while reducing the annotation time by a factor 10. Keywords : Common ground, spontaneous interaction, smile, automatic detection.

2016

pdf bib abs
4Couv: A New Treebank for French
Philippe Blache | Grégoire de Montcheuil | Laurent Prévot | Stéphane Rauzy
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The question of the type of text used as primary data in treebanks is of certain importance. First, it has an influence at the discourse level: an article is not organized in the same way as a novel or a technical document. Moreover, it also has consequences in terms of semantic interpretation: some types of texts can be easier to interpret than others. We present in this paper a new type of treebank which presents the particularity to answer to specific needs of experimental linguistic. It is made of short texts (book backcovers) that presents a strong coherence in their organization and can be rapidly interpreted. This type of text is adapted to short reading sessions, making it easy to acquire physiological data (e.g. eye movement, electroencepholagraphy). Such a resource offers reliable data when looking for correlations between computational models and human language processing.

pdf bib abs
MarsaGram: an excursion in the forests of parsing trees
Philippe Blache | Stéphane Rauzy | Grégoire Montcheuil
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The question of how to compare languages and more generally the domain of linguistic typology, relies on the study of different linguistic properties or phenomena. Classically, such a comparison is done semi-manually, for example by extracting information from databases such as the WALS. However, it remains difficult to identify precisely regular parameters, available for different languages, that can be used as a basis towards modeling. We propose in this paper, focusing on the question of syntactic typology, a method for automatically extracting such parameters from treebanks, bringing them into a typology perspective. We present the method and the tools for inferring such information and navigating through the treebanks. The approach has been applied to 10 languages of the Universal Dependencies Treebank. We approach is evaluated by showing how automatic classification correlates with language families.

2015

pdf bib abs
Typologie automatique des langues à partir de treebanks
Philippe Blache | Grégroie de Montcheuil | Stéphane Rauzy
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

La typologie des langues repose sur l’étude de la réalisation de propriétés ou phénomènes linguistiques dans plusieurs langues ou familles de langues. Nous abordons dans cet article la question de la typologie syntaxique et proposons une méthode permettant d’extraire automatiquement ces propriétés à partir de treebanks, puis de les analyser en vue de dresser une telle typologie. Nous décrivons cette méthode ainsi que les outils développés pour la mettre en œuvre. Celle-ci a été appliquée à l’analyse de 10 langues décrites dans le Universal Dependencies Treebank. Nous validons ces résultats en montrant comment une technique de classification permet, sur la base des informations extraites, de reconstituer des familles de langues.

pdf bib abs
Création d’un nouveau treebank à partir de quatrièmes de couverture
Philippe Blache | Grégoire Moncheuil | Stéphane Rauzy | Marie-Laure Guénot
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Nous présentons ici 4-couv, un nouveau corpus arboré d’environ 3 500 phrases, constitué d’un ensemble de quatrièmes de couverture, étiqueté et analysé automatiquement puis corrigé et validé à la main. Il répond à des besoins spécifiques pour des projets de linguistique expérimentale, et vise à rester compatible avec les autres treebanks existants pour le français. Nous présentons ici le corpus lui-même ainsi que les outils utilisés pour les différentes étapes de son élaboration : choix des textes, étiquetage, parsing, correction manuelle.

2012

pdf bib
Enrichissement du FTB : un treebank hybride constituants/propriétés (Enriching the French Treebank with Properties) [in French]
Philippe Blache | Stéphane Rauzy
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib
Robustness and processing difficulty models. A pilot study for eye-tracking data on the French Treebank
Stéphane Rauzy | Philippe Blache
Proceedings of the First Workshop on Eye-tracking and Natural Language Processing

2011

pdf bib
Predicting Linguistic Difficulty by Means of a Morpho-Syntactic Probabilistic Model
Philippe Blache | Stéphane Rauzy
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

2010

Large annotation projects, typically those addressing the question of multimodal annotation in which many different kinds of information have to be encoded, have to elaborate precise and high level annotation schemes. Doing this requires first to define the structure of the information: the different objects and their organization. This stage has to be as much independent as possible from the coding language constraints. This is the reason why we propose a preliminary formal annotation model, represented with typed feature structures. This representation requires a precise definition of the different objects, their properties (or features) and their relations, represented in terms of type hierarchies. This approach has been used to specify the annotation scheme of a large multimodal annotation project (OTIM) and experimented in the annotation of a multimodal corpus (CID, Corpus of Interactional Data). This project aims at collecting, annotating and exploiting a dialogue video corpus in a multimodal perspective (including speech and gesture modalities). The corpus itself, is made of 8 hours of dialogues, fully transcribed and richly annotated (phonetics, syntax, pragmatics, gestures, etc.).

2008

pdf bib abs
Influence de la qualité de l’étiquetage sur le chunking : une corrélation dépendant de la taille des chunks
Philippe Blache | Stéphane Rauzy
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Nous montrons dans cet article qu’il existe une corrélation étroite existant entre la qualité de l’étiquetage morpho-syntaxique et les performances des chunkers. Cette corrélation devient linéaire lorsque la taille des chunks est limitée. Nous appuyons notre démonstration sur la base d’une expérimentation conduite suite à la campagne d’évaluation Passage 2007 (de la Clergerie et al., 2008). Nous analysons pour cela les comportements de deux analyseurs ayant participé à cette campagne. L’interprétation des résultats montre que la tâche de chunking, lorsqu’elle vise des chunks courts, peut être assimilée à une tâche de “super-étiquetage”.

2006

pdf bib abs
Mécanismes de contrôle pour l’analyse en Grammaires de Propriétés
Philippe Blache | Stéphane Rauzy
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Les méthodes d’analyse syntaxiques hybrides, reposant à la fois sur des techniques statistiques et symboliques, restent peu exploitées. Dans la plupart des cas, les informations statistiques sont intégrées à un squelette contextfree et sont utilisées pour contrôler le choix des règles ou des structures. Nous proposons dans cet article une méthode permettant de calculer un indice de corrélation entre deux objets linguistiques (catégories, propriétés). Nous décrivons une utilisation de cette notion dans le cadre de l’analyse des Grammaires de Propriétés. L’indice de corrélation nous permet dans ce cas de contrôler à la fois la sélection des constituants d’une catégorie, mais également la satisfaction des propriétés qui la décrivent.

pdf bib
Acceptability Prediction by Means of Grammaticality Quantification
Philippe Blache | Barbara Hemforth | Stéphane Rauzy
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2005

pdf bib abs
Une plateforme pour l’acquisition, la maintenance et la validation de ressources lexicales
Tristan Vanrullen | Philippe Blache | Cristel Portes | Stéphane Rauzy | Jean-François Maeyhieux
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Nous présentons une plateforme de développement de lexique offrant une base lexicale accompagnée d’un certain nombre d’outils de maintenance et d’utilisation. Cette base, qui comporte aujourd’hui 440.000 formes du Français contemporain, est destinée à être diffusée et remise à jour régulièrement. Nous exposons d’abord les outils et les techniques employées pour sa constitution et son enrichissement, notamment la technique de calcul des fréquences lexicales par catégorie morphosyntaxique. Nous décrivons ensuite différentes approches pour constituer un sous-lexique de taille réduite, dont la particularité est de couvrir plus de 90% de l’usage. Un tel lexique noyau offre en outre la possibilité d’être réellement complété manuellement avec des informations sémantiques, de valence, pragmatiques etc.

Venues

ACL1