Maxime Amblard


Discourse Structure Extraction from Pre-Trained and Fine-Tuned Language Models in Dialogues
Chuyuan Li | Patrick Huber | Wen Xiao | Maxime Amblard | Chloe Braud | Giuseppe Carenini
Findings of the Association for Computational Linguistics: EACL 2023

Discourse processing suffers from data sparsity, especially for dialogues. As a result, we explore approaches to infer latent discourse structures for dialogues, based on attention matrices from Pre-trained Language Models (PLMs). We investigate multiple auxiliary tasks for fine-tuning and show that the dialogue-tailored Sentence Ordering task performs best. To locate and exploit discourse information in PLMs, we propose an unsupervised and a semi-supervised method. Our proposals thereby achieve encouraging results on the STAC corpus, with F1 scores of 57.2 and 59.3 for the unsupervised and semi-supervised methods, respectively. When restricted to projective trees, our scores improved to 63.3 and 68.1.


A Multi-Party Dialogue Ressource in French
Maria Boritchev | Maxime Amblard
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We presentDialogues in Games(DinG), a corpus of manual transcriptions of real-life, oral, spontaneous multi-party dialogues between French-speaking players of the board game Catan. Our objective is to make available a quality resource for French, composed of long dialogues, to facilitate their study in the style of (Asher et al., 2016). In a general dialogue setting, participants share personal information, which makes it impossible to disseminate the resource freely and openly. In DinG, the attention of the participants is focused on the game, which prevents them from talking about themselves. In addition, we are conducting a study on the nature of the questions in dialogue, through annotation (Cruz Blandon et al., 2019), in order to develop more natural automatic dialogue systems

Quantification Annotation in ISO 24617-12, Second Draft
Harry Bunt | Maxime Amblard | Johan Bos | Karën Fort | Bruno Guillaume | Philippe de Groote | Chuyuan Li | Pierre Ludmann | Michel Musiol | Siyana Pavlova | Guy Perrier | Sylvain Pogodalla
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper describes the continuation of a project that aims at establishing an interoperable annotation schema for quantification phenomena as part of the ISO suite of standards for semantic annotation, known as the Semantic Annotation Framework. After a break, caused by the Covid-19 pandemic, the project was relaunched in early 2022 with a second working draft of an annotation scheme, which is discussed in this paper. Keywords: semantic annotation, quantification, interoperability, annotation schema, ISO standard

Multi-Task Learning for Depression Detection in Dialogs
Chuyuan Li | Chloé Braud | Maxime Amblard
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Depression is a serious mental illness that impacts the way people communicate, especially through their emotions, and, allegedly, the way they interact with others. This work examines depression signals in dialogs, a less studied setting that suffers from data sparsity. We hypothesize that depression and emotion can inform each other, and we propose to explore the influence of dialog structure through topic and dialog act prediction. We investigate a Multi-Task Learning (MTL) approach, where all tasks mentioned above are learned jointly with dialog-tailored hierarchical modeling. We experiment on the DAIC and DailyDialog corpora – both contain dialogs in English – and show important improvements over state-of-the-art on depression detection (at best 70.6% F1), which demonstrates the correlation of depression with emotion and dialog organization and the power of MTL to leverage information from different sources.

Graph Querying for Semantic Annotations
Maxime Amblard | Bruno Guillaume | Siyana Pavlova | Guy Perrier
Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022

This paper presents how the online tool Grew-match can be used to make queries and visualise data from existing semantically annotated corpora. A dedicated syntax is available to construct simple to complex queries and execute them against a corpus. Such queries give transverse views of the annotated data, this views can help for checking the consistency of annotations in one corpus or across several corpora. Grew-match can then be seen as an error mining tool: when inconsistencies are detected, it helps finding the sentences which should be fixed. Finally, Grew-match can also be used as a side tool to assist annotation task helping to find annotations examples in existing corpora to be compare to the data to be annotated.

How much of UCCA can be predicted from AMR?
Siyana Pavlova | Maxime Amblard | Bruno Guillaume
Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022

In this paper, we consider two of the currently popular semantic frameworks: Abstract Meaning Representation (AMR) - a more abstract framework, and Universal Conceptual Cognitive Annotation (UCCA) - an anchored framework. We use a corpus-based approach to build two graph rewriting systems, a deterministic and a non-deterministic one, from the former to the latter framework. We present their evaluation and a number of ambiguities that we discovered while building our rules. Finally, we provide a discussion and some future work directions in relation to comparing semantic frameworks of different flavors.


GECko+: a Grammatical and Discourse Error Correction Tool
Eduardo Calò | Léo Jacqmin | Thibo Rosemplatt | Maxime Amblard | Miguel Couceiro | Ajinkya Kulkarni
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 3 : Démonstrations

GECko+ : a Grammatical and Discourse Error Correction Tool We introduce GECko+, a web-based writing assistance tool for English that corrects errors both at the sentence and at the discourse level. It is based on two state-of-the-art models for grammar error correction and sentence ordering. GECko+ is available online as a web application that implements a pipeline combining the two models.

A New Broad NLP Training from Speech to Knowledge
Maxime Amblard | Miguel Couceiro
Proceedings of the Fifth Workshop on Teaching NLP

In 2018, the Master Sc. in NLP opened at IDMC - Institut des Sciences du Digital, du Management et de la Cognition, Université de Lorraine - Nancy, France. Far from being a creation ex-nihilo, it is the product of a history and many reflections on the field and its teaching. This article proposes epistemological and critical elements on the opening and maintainance of this so far new master’s program in NLP.

Investigating non lexical markers of the language of schizophrenia in spontaneous conversations
Chuyuan Li | Maxime Amblard | Chloé Braud | Caroline Demily | Nicolas Franck | Michel Musiol
Proceedings of the 2nd Workshop on Computational Approaches to Discourse

We investigate linguistic markers associated with schizophrenia in clinical conversations by detecting predictive features among French-speaking patients. Dealing with human-human dialogues makes for a realistic situation, but it calls for strategies to represent the context and face data sparsity. We compare different approaches for data representation – from individual speech turns to entire conversations –, and data modeling, using lexical, morphological, syntactic, and discourse features, dimensions presumed to be tightly connected to the language of schizophrenia. Previous English models were mostly lexical and reached high performance, here replicated (93.7% acc.). However, our analysis reveals that these models are heavily biased, which probably concerns most datasets on this task. Our new delexicalized models are more general and robust, with the best accuracy score at 77.9%.


A French Version of the FraCaS Test Suite
Maxime Amblard | Clément Beysson | Philippe de Groote | Bruno Guillaume | Sylvain Pogodalla
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper presents a French version of the FraCaS test suite. This test suite, originally written in English, contains problems illustrating semantic inference in natural language. We describe linguistic choices we had to make when translating the FraCaS test suite in French, and discuss some of the issues that were raised by the translation. We also report an experiment we ran in order to test both the translation and the logical semantics underlying the problems of the test suite. This provides a way of checking formal semanticists’ hypotheses against actual semantic capacity of speakers (in the present case, French speakers), and allow us to compare the results we obtained with the ones of similar experiments that have been conducted for other languages.

pdf bib
Investigation par méthodes d’apprentissage des spécificités langagières propres aux personnes avec schizophrénie (Investigating Learning Methods Applied to Language Specificity of Persons with Schizophrenia)
Maxime Amblard | Chloé Braud | Chuyuan Li | Caroline Demily | Nicolas Franck | Michel Musiol
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles

Nous présentons des expériences visant à identifier automatiquement des patients présentant des symptômes de schizophrénie dans des conversations contrôlées entre patients et psychothérapeutes. Nous fusionnons l’ensemble des tours de parole de chaque interlocuteur et entraînons des modèles de classification utilisant des informations lexicales, morphologiques et syntaxiques. Cette étude est la première du genre sur le français et obtient des résultats comparables à celles sur l’anglais. Nos premières expériences tendent à montrer que la parole des personnes avec schizophrénie se distingue de celle des témoins : le meilleur modèle obtient une exactitude de 93,66%. Des informations plus riches seront cependant nécessaires pour parvenir à un modèle robuste.

pdf bib
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). 2e atelier Éthique et TRaitemeNt Automatique des Langues (ETeRNAL)
Gilles Adda | Maxime Amblard | Karën Fort
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). 2e atelier Éthique et TRaitemeNt Automatique des Langues (ETeRNAL)


A compositional view of questions
Maria Boritchev | Maxime Amblard
Proceedings of the 2019 Workshop on Widening NLP

We present a research on compositional treatment of questions in neo-davidsonian event semantics style. Our work is based on (Champollion, 2011) where only declarative sentences were considered. Our research is based on complex formal examples, paving the way towards further research in this domain and further testing on real-life corpora.

Toward Dialogue Modeling: A Semantic Annotation Scheme for Questions and Answers
María Andrea Cruz Blandón | Gosse Minnema | Aria Nourbakhsh | Maria Boritchev | Maxime Amblard
Proceedings of the 13th Linguistic Annotation Workshop

The present study proposes an annotation scheme for classifying the content and discourse contribution of question-answer pairs. We propose detailed guidelines for using the scheme and apply them to dialogues in English, Spanish, and Dutch. Finally, we report on initial machine learning experiments for automatic annotation.


pdf bib
Modal Subordination in Type Theoretic Dynamic Logic
Sai Qian | Philippe de Groote | Maxime Amblard
Linguistic Issues in Language Technology, Volume 14, 2016 - Modality: Logic, Semantics, Annotation, and Machine Learning

Classical theories of discourse semantics, such as Discourse Representation Theory (DRT), Dynamic Predicate Logic (DPL), predict that an indefinite noun phrase cannot serve as antecedent for an anaphor if the noun phrase is, but the anaphor is not, in the scope of a modal expression. However, this prediction meets with counterexamples. The phenomenon modal subordination is one of them. In general, modal subordination is concerned with more than two modalities, where the modality in subsequent sentences is interpreted in a context ‘subordinate’ to the one created by the first modal expression. In other words, subsequent sentences are interpreted as being conditional on the scenario introduced in the first sentence. One consequence is that the anaphoric potential of indefinites may extend beyond the standard limits of accessibility constraints. This paper aims to give a formal interpretation on modal subordination. The theoretical backbone of the current work is Type Theoretic Dynamic Logic (TTDL), which is a Montagovian account of discourse semantics. Different from other dynamic theories, TTDL was built on classical mathematical and logical tools, such as λ-calculus and Church’s theory of types. Hence it is completely compositional and does not suffer from the destructive assignment problem. We will review the basic set-up of TTDL and then present Kratzer’s theory on natural language modality. After that, by integrating the notion of conversation background, in particular, the modal base usage, we offer an extension of TTDL (called Modal-TTDL, or M-TTDL in short) which properly deals with anaphora across modality. The formal relation between Modal-TTDL and TTDL will be discussed as well. We uncover the difficulty of specific sense distinctions by investigating distributional bias and reducing the sparsity of existing small-scale corpora used in prior work. We build a semantically enriched model for modal sense classification by designing novel features related to lexical, proposition-level and discourse-level semantic factors. Besides improved classification performance, closer examination of interpretable feature sets unveils relevant semantic and contextual factors in modal sense classification. Finally, we investigate genre effects on modal sense distribution and how they affect classification performance. Our investigations uncover the difficulty of specific sense distinctions and how they are affected by training set size and distributional bias. Our large-scale experiments confirm that semantically enriched models outperform models built on shallow feature sets. Cross-genre experiments shed light on differences in sense distributions across genres and confirm that semantically enriched models have high generalization capacity, especially in unstable distributional settings.


Quantitative study of disfluencies in schizophrenics’ speech: Automatize to limit biases (Étude quantitative des disfluences dans le discours de schizophrènes : automatiser pour limiter les biais) [in French]
Maxime Amblard | Karën Fort
Proceedings of TALN 2014 (Volume 1: Long Papers)


Une analyse basée sur la S-DRT pour la modélisation de dialogues pathologiques (An analysis based on the S-DRT for modeling pathological dialogues)
Maxime Amblard | Michel Musiol | Manuel Rebuschi
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Dans cet article, nous présentons la définition et l’étude d’un corpus de dialogues entre un schizophrène et un interlocuteur ayant pour objectif la conduite et le maintien de l’échange. Nous avons identifié des discontinuités significatives chez les schizophrènes paranoïdes. Une représentation issue de la S-DRT (sa partie pragmatique) permet de rendre compte des ces usages non standards.


Discourse Representation Theory et graphes sémantiques : formalisation sémantique en contexte industriel
Maxime Amblard | Johannes Heinecke | Estelle Maillebuau
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Ces travaux présentent une extension des représentations formelles pour la sémantique, de l’outil de traitement automatique des langues de Orange Labs1. Nous abordons ici uniquement des questions relatives à la construction des représentations sémantiques, dans le cadre de l’analyse linguistique. Afin d’obtenir des représentations plus fines de la structure argumentale des énoncés, nous incluons des concepts issus de la DRT dans le système de représentation basé sur les graphes sémantiques afin de rendre compte de la notion de portée.


pdf bib
Synchronisation syntaxe sémantique, des grammaires minimalistes catégorielles (GMC) aux Constraint Languages for Lambda Structures (CLLS)
Maxime Amblard
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues (articles courts)

Ces travaux se basent sur l’approche computationelle et logique de Ed Stabler (?), qui donne une formalisation sous forme de grammaire du programme minimaliste de Noam Chomsky (?). La question que je veux aborder est comment, à partir d’une analyse syntaxique retrouver la forme prédicative de l’énoncé. Pour cela, il faut mettre en place une interface entre syntaxe et sémantique. C’est ce que je propose en utilisant les Grammaires Minimalistes Catégorielles (GMC) extension des GM vers le calcul de Lambeck. Ce nouveau formalisme permet une synchronisation simple avec le lambda-calcul. Parmi les questions fréquemment rencontrées dans le traitement des langues naturelles, j’interroge la performance de cette interface pour la résolution des problèmes de portée des quantificateurs. Je montre pourquoi et comment il faut utiliser un lambda-calcul plus élaboré pour obtenir les différentes lectures, en utilisant Constraint Languages for Lambda Structures -CLLS.