We present in this paper the first natural conversation corpus recorded with all modalities and neuro-physiological signals. 5 dyads (10 participants) have been recorded three times, during three sessions (30mns each) with 4 days interval. During each session, audio and video are captured as well as the neural signal (EEG with Emotiv-EPOC) and the electro-physiological one (with Empatica-E4). This resource original in several respects. Technically, it is the first one gathering all these types of data in a natural conversation situation. Moreover, the recording of the same dyads at different periods opens the door to new longitudinal investigations such as the evolution of interlocutors’ alignment during the time. The paper situates this new type of resources with in the literature, presents the experimental setup and describes different annotations enriching the corpus.
The mechanisms underlying human communication have been under investigation for decades, but the answer to how understanding between locutors emerges remains incomplete. Interaction theories suggest the development of a structural alignment between the speakers, allowing for the construction of a shared knowledge base (common ground). In this paper, we propose to apply metrics derived from information theory to quantify the amount of information exchanged between participants, the dynamics of information exchanges, to provide an objective way to measure the common ground instantiation. We focus on a corpus of free conversations augmented with prosodic segmentation and an expert annotation of thematic episodes. We show that during free conversations, the amount of information remains globally constant at the scale of the conversation, but varies depending on the thematic structuring, underlining the role of the speaker introducing the theme. We propose an original methodology applied to uncontrolled material.
Usage-based constructionist approaches consider language a structured inventory of constructions, form-meaning pairings of different schematicity and complexity, and claim that the more a linguistic pattern is encountered, the more it becomes accessible to speakers. However, when an expression is unavailable, what processes underlie the interpretation? While traditional answers rely on the principle of compositionality, for which the meaning is built word-by-word and incrementally, usage-based theories argue that novel utterances are created based on previously experienced ones through analogy, mapping an existing structural pattern onto a novel instance. Starting from this theoretical perspective, we propose here a computational implementation of these assumptions. As the principle of compositionality has been used to generate distributional representations of phrases, we propose a neural network simulating the construction of phrasal embedding as an analogical process. Our framework, inspired by word2vec and computer vision techniques, was evaluated on tasks of generalization from existing vectors.
The aim of this study is to investigate conversational feedbacks that contain smiles and laughs. Firstly, we propose a statistical analysis of smiles and laughs used as generic and specific feedbacks in a corpus of French talk-in-interaction. Our results show that smiles of low intensity are preferentially used to produce generic feedbacks while high intensity smiles and laughs are preferentially used to produce specific feedbacks. Secondly, based on a machine learning approach, we propose a hierarchical classification of feedback to automatically predict not only the presence/absence of a smile but, also the type of smiles according to an intensity-scale (low or high).
Prior research has explored the ability of computational models to predict a word semantic fit with a given predicate. While much work has been devoted to modeling the typicality relation between verbs and arguments in isolation, in this paper we take a broader perspective by assessing whether and to what extent computational approaches have access to the information about the typicality of entire events and situations described in language (Generalized Event Knowledge). Given the recent success of Transformers Language Models (TLMs), we decided to test them on a benchmark for the dynamic estimation of thematic fit. The evaluation of these models was performed in comparison with SDM, a framework specifically designed to integrate events in sentence meaning representations, and we conducted a detailed error analysis to investigate which factors affect their behavior. Our results show that TLMs can reach performances that are comparable to those achieved by SDM. However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge, and their predictions often depend on surface linguistic features, such as frequent words, collocations and syntactic patterns, thereby showing sub-optimal generalization abilities.
In linguistics and cognitive science, Logical metonymies are defined as type clashes between an event-selecting verb and an entity-denoting noun (e.g. The editor finished the article), which are typically interpreted by inferring a hidden event (e.g. reading) on the basis of contextual cues. This paper tackles the problem of logical metonymy interpretation, that is, the retrieval of the covert event via computational methods. We compare different types of models, including the probabilistic and the distributional ones previously introduced in the literature on the topic. For the first time, we also tested on this task some of the recent Transformer-based models, such as BERT, RoBERTa, XLNet, and GPT-2. Our results show a complex scenario, in which the best Transformer-based models and some traditional distributional models perform very similarly. However, the low performance on some of the testing datasets suggests that logical metonymy is still a challenging phenomenon for computational modeling.
This paper presents an original dataset of controlled interactions, focusing on the study of feedback items. It consists on recordings of different conversations between a doctor and a patient, played by actors. In this corpus, the patient is mainly a listener and produces different feedbacks, some of them being (voluntary) incongruent. Moreover, these conversations have been re-synthesized in a virtual reality context, in which the patient is played by an artificial agent. The final corpus is made of different movies of human-human conversations plus the same conversations replayed in a human-machine context, resulting in the first human-human/human-machine parallel corpus. The corpus is then enriched with different multimodal annotations at the verbal and non-verbal levels. Moreover, and this is the first dataset of this type, we have designed an experiment during which different participants had to watch the movies and give an evaluation of the interaction. During this task, we recorded participant’s brain signal. The Brain-IHM dataset is then conceived with a triple purpose: 1/ studying feedbacks by comparing congruent vs. incongruent feedbacks 2/ comparing human-human and human-machine production of feedbacks 3/ studying the brain basis of feedback perception.
Dialogue act classification becomes a complex task when dealing with fine-grain labels. Many applications require such level of labelling, typically automatic dialogue systems. We present in this paper a 2-level classification technique, distinguishing between generic and specific dialogue acts (DA). This approach makes it possible to benefit from the very good accuracy of generic DA classification at the first level and proposes an efficient approach for specific DA, based on high-level linguistic features. Our results show the interest of involving such features into the classifiers, outperforming all other feature sets, in particular those classically used in DA classification.
In this paper, we propose a new type of semantic representation of Construction Grammar that combines constructions with the vector representations used in Distributional Semantics. We introduce a new framework, Distributional Construction Grammar, where grammar and meaning are systematically modeled from language use, and finally, we discuss the kind of contributions that distributional models can provide to CxG representation from a linguistic and cognitive perspective.
Distributional Semantic Models have been successfully used for modeling selectional preferences in a variety of scenarios, since distributional similarity naturally provides an estimate of the degree to which an argument satisfies the requirement of a given predicate. However, we argue that the performance of such models on rare verb-argument combinations has received relatively little attention: it is not clear whether they are able to distinguish the combinations that are simply atypical, or implausible, from the semantically anomalous ones, and in particular, they have never been tested on the task of modeling their differences in processing complexity. In this paper, we compare two different models of thematic fit by testing their ability of identifying violations of selectional restrictions in two datasets from the experimental studies.
In this paper, we introduce a new distributional method for modeling predicate-argument thematic fit judgments. We use a syntax-based DSM to build a prototypical representation of verb-specific roles: for every verb, we extract the most salient second order contexts for each of its roles (i.e. the most salient dimensions of typical role fillers), and then we compute thematic fit as a weighted overlap between the top features of candidate fillers and role prototypes. Our experiments show that our method consistently outperforms a baseline re-implementing a state-of-the-art system, and achieves better or comparable results to those reported in the literature for the other unsupervised systems. Moreover, it provides an explicit representation of the features characterizing verb-specific semantic roles.
In theoretical linguistics, logical metonymy is defined as the combination of an event-subcategorizing verb with an entity-denoting direct object (e.g., The author began the book), so that the interpretation of the VP requires the retrieval of a covert event (e.g., writing). Psycholinguistic studies have revealed extra processing costs for logical metonymy, a phenomenon generally explained with the introduction of new semantic structure. In this paper, we present a general distributional model for sentence comprehension inspired by the Memory, Unification and Control model by Hagoort (2013,2016). We show that our distributional framework can account for the extra processing costs of logical metonymy and can identify the covert event in a classification task.
The question of the type of text used as primary data in treebanks is of certain importance. First, it has an influence at the discourse level: an article is not organized in the same way as a novel or a technical document. Moreover, it also has consequences in terms of semantic interpretation: some types of texts can be easier to interpret than others. We present in this paper a new type of treebank which presents the particularity to answer to specific needs of experimental linguistic. It is made of short texts (book backcovers) that presents a strong coherence in their organization and can be rapidly interpreted. This type of text is adapted to short reading sessions, making it easy to acquire physiological data (e.g. eye movement, electroencepholagraphy). Such a resource offers reliable data when looking for correlations between computational models and human language processing.
The question of how to compare languages and more generally the domain of linguistic typology, relies on the study of different linguistic properties or phenomena. Classically, such a comparison is done semi-manually, for example by extracting information from databases such as the WALS. However, it remains difficult to identify precisely regular parameters, available for different languages, that can be used as a basis towards modeling. We propose in this paper, focusing on the question of syntactic typology, a method for automatically extracting such parameters from treebanks, bringing them into a typology perspective. We present the method and the tools for inferring such information and navigating through the treebanks. The approach has been applied to 10 languages of the Universal Dependencies Treebank. We approach is evaluated by showing how automatic classification correlates with language families.
In this paper, we introduce for the first time a Distributional Model for computing semantic complexity, inspired by the general principles of the Memory, Unification and Control framework(Hagoort, 2013; Hagoort, 2016). We argue that sentence comprehension is an incremental process driven by the goal of constructing a coherent representation of the event represented by the sentence. The composition cost of a sentence depends on the semantic coherence of the event being constructed and on the activation degree of the linguistic constructions. We also report the results of a first evaluation of the model on the Bicknell dataset (Bicknell et al., 2010).
La typologie des langues repose sur l’étude de la réalisation de propriétés ou phénomènes linguistiques dans plusieurs langues ou familles de langues. Nous abordons dans cet article la question de la typologie syntaxique et proposons une méthode permettant d’extraire automatiquement ces propriétés à partir de treebanks, puis de les analyser en vue de dresser une telle typologie. Nous décrivons cette méthode ainsi que les outils développés pour la mettre en œuvre. Celle-ci a été appliquée à l’analyse de 10 langues décrites dans le Universal Dependencies Treebank. Nous validons ces résultats en montrant comment une technique de classification permet, sur la base des informations extraites, de reconstituer des familles de langues.
Nous présentons ici 4-couv, un nouveau corpus arboré d’environ 3 500 phrases, constitué d’un ensemble de quatrièmes de couverture, étiqueté et analysé automatiquement puis corrigé et validé à la main. Il répond à des besoins spécifiques pour des projets de linguistique expérimentale, et vise à rester compatible avec les autres treebanks existants pour le français. Nous présentons ici le corpus lui-même ainsi que les outils utilisés pour les différentes étapes de son élaboration : choix des textes, étiquetage, parsing, correction manuelle.
This paper focuses on the representation and querying of knowledge-based multimodal data. This work stands in the OTIM project which aims at processing multimodal annotation of a large conversational French speech corpus. Within OTIM, we aim at providing linguists with a unique framework to encode and manipulate numerous linguistic domains (from prosody to gesture). Linguists commonly use Typed Feature Structures (TFS) to provide an uniform view of multimodal annotations but such a representation cannot be used within an applicative framework. Moreover TFS expressibility is limited to hierarchical and constituency relations and does not suit to any linguistic domain that needs for example to represent temporal relations. To overcome these limits, we propose an ontological approach based on Description logics (DL) for the description of linguistic knowledge and we provide an applicative framework based on OWL DL (Ontology Web Language) and the query language SPARQL.
Large annotation projects, typically those addressing the question of multimodal annotation in which many different kinds of information have to be encoded, have to elaborate precise and high level annotation schemes. Doing this requires first to define the structure of the information: the different objects and their organization. This stage has to be as much independent as possible from the coding language constraints. This is the reason why we propose a preliminary formal annotation model, represented with typed feature structures. This representation requires a precise definition of the different objects, their properties (or features) and their relations, represented in terms of type hierarchies. This approach has been used to specify the annotation scheme of a large multimodal annotation project (OTIM) and experimented in the annotation of a multimodal corpus (CID, Corpus of Interactional Data). This project aims at collecting, annotating and exploiting a dialogue video corpus in a multimodal perspective (including speech and gesture modalities). The corpus itself, is made of 8 hours of dialogues, fully transcribed and richly annotated (phonetics, syntax, pragmatics, gestures, etc.).
Cet article présente un modèle de la complexité syntaxique. Il réunit un ensemble d’indices de complexité et les représente à l’aide d’un cadre formel homogène, offrant ainsi la possibilité d’une quantification automatique : le modèle proposé permet d’associer à chaque phrase un indice reflétant sa complexité.
Un des problèmes majeurs de la linguistique aujourd’hui réside dans la prise en compte de phénomènes relevant de domaines et de modalités différentes. Dans la littérature, la réponse consiste à représenter les relations pouvant exister entre ces domaines de façon externe, en termes de relation de structure à structure, s’appuyant donc sur une description distincte de chaque domaine ou chaque modalité. Nous proposons dans cet article une approche différente permettant représenter ces phénomènes dans un cadre formel unique, permettant de rendre compte au sein d’une même grammaire tous les phénomènes concernés. Cette représentation précise de l’interaction entre domaines et modalités s’appuie sur la définition de relations d’alignement.
Nous montrons dans cet article qu’il existe une corrélation étroite existant entre la qualité de l’étiquetage morpho-syntaxique et les performances des chunkers. Cette corrélation devient linéaire lorsque la taille des chunks est limitée. Nous appuyons notre démonstration sur la base d’une expérimentation conduite suite à la campagne d’évaluation Passage 2007 (de la Clergerie et al., 2008). Nous analysons pour cela les comportements de deux analyseurs ayant participé à cette campagne. L’interprétation des résultats montre que la tâche de chunking, lorsqu’elle vise des chunks courts, peut être assimilée à une tâche de “super-étiquetage”.
The paper presents a project of the Laboratoire Parole & Langage which aims at collecting, annotating and exploiting a corpus of spoken French in a multimodal perspective. The project directly meets the present needs in linguistics where a growing number of researchers become aware of the fact that a theory of communication which aims at describing real interactions should take into account the complexity of these interactions. However, in order to take into account such a complexity, linguists should have access to spoken corpora annotated in different fields. The paper presents the annotation schemes used in phonetics, morphology and syntax, prosody, gestuality at the LPL together with the type of linguistic description made from the annotations seen in two examples.
When a user cannot find a word, he may think of semantically related words that could be used into an automatic process to help him. This paper presents an evaluation of lexical resources and semantic networks for modelling mental associations. A corpus of associations has been constructed for its evaluation. It is composed of 20 low frequency target words each associated 5 times by 20 users. In the experiments we look for the target word in propositions made from the associated words thanks to 5 different resources. The results show that even if each resource has a useful specificity, the global recall is low. An experiment to extract common semantic features of several associations showed that we cannot expect to see the target word below a rank of 20 propositions.
This paper presents the sequential evaluation of the question answering system SQuaLIA. This system is based on the same sequential process as most statistical question answering systems, involving 4 main steps from question analysis to answer extraction.The evaluation is based on a corpus made from 20 questions taken in the set of an evaluation campaign and which were well answered by SQuaLIA. Each of the 20 questions has been typed by 17 native participants, non natives and dyslexics. They were vocally instructed the target of each question. Each of the 4 analysis steps of the system involves a loss of accuracy, until an average of 60 of right answers at the end of the process. The main cause of this loss seems to be the orthographic mistakes users make on nouns.
Cet article décrit une méthode qui combine des hypothèses graphémiques et phonétiques au niveau de la phrase, à l’aide d’une réprésentation en automates à états finis et d’un modèle de langage, pour la réécriture de phrases tapées au clavier par des dysorthographiques. La particularité des écrits dysorthographiés qui empêche les correcteurs orthographiques d’être efficaces pour cette tâche est une segmentation en mots parfois incorrecte. La réécriture diffère de la correction en ce sens que les phrases réécrites ne sont pas à destination de l’utilisateur mais d’un système automatique, tel qu’un moteur de recherche. De ce fait l’évaluation est conduite sur des versions filtrées et lemmatisées des phrases. Le taux d’erreurs mots moyen passe de 51 % à 20 % avec notre méthode, et est de 0 % sur 43 % des phrases testées.
This paper describes the unfolding of the EASy evaluation campaign for french parsers as well as the techniques employed for the participation of laboratory LPL to this campaign. Three symbolic parsers based on a same resource and a same formalism (Property Grammars) are described and evaluated. The first results of this evaluation are analyzed and lead to the conclusion that symbolic parsing in a constraint-based formalism is efficient and robust.
Nous proposons et testons deux méthodes de prédiction de la capacité d’un système à répondre à une question factuelle. Une telle prédiciton permet de déterminer si l’on doit initier un dialogue afin de préciser ou de reformuler la question posée par l’utilisateur. La première approche que nous proposons est une adaptation d’une méthode de prédiction dans le domaine de la recherche documentaire, basée soit sur des machines à vecteurs supports (SVM) soit sur des arbres de décision, avec des critères tels que le contenu des questions ou des documents, et des mesures de cohésion entre les documents ou passages de documents d’où sont extraits les réponses. L’autre approche vise à utiliser le type de réponse attendue pour décider de la capacité du système à répondre. Les deux approches ont été testées sur les données de la campagne Technolangue EQUER des systèmes de questions-réponses en français. L’approche à base de SVM est celle qui obtient les meilleurs résultats. Elle permet de distinguer au mieux les questions faciles, celles auxquelles notre système apporte une bonne réponse, des questions difficiles, celles restées sans réponses ou auxquelles le système a répondu de manière incorrecte. A l’opposé on montre que pour notre système, le type de réponse attendue (personnes, quantités, lieux...) n’est pas un facteur déterminant pour la difficulté d’une question.
Les méthodes d’analyse syntaxiques hybrides, reposant à la fois sur des techniques statistiques et symboliques, restent peu exploitées. Dans la plupart des cas, les informations statistiques sont intégrées à un squelette contextfree et sont utilisées pour contrôler le choix des règles ou des structures. Nous proposons dans cet article une méthode permettant de calculer un indice de corrélation entre deux objets linguistiques (catégories, propriétés). Nous décrivons une utilisation de cette notion dans le cadre de l’analyse des Grammaires de Propriétés. L’indice de corrélation nous permet dans ce cas de contrôler à la fois la sélection des constituants d’une catégorie, mais également la satisfaction des propriétés qui la décrivent.
L’analyse syntaxique reste un problème complexe au point que nombre d’applications n’ont recours qu’à des analyseurs superficiels. Nous faisons dans cet article le point sur les notions d’analyse superficielles et profondes en proposant une première caractérisation de la notion de complexité opérationnelle pour l’analyse syntaxique automatique permettant de distinguer objets et relations plus ou moins difficiles à identifier. Sur cette base, nous proposons un bilan des différentes techniques permettant de caractériser et combiner analyse superficielle et profonde.
Nous présentons une plateforme de développement de lexique offrant une base lexicale accompagnée d’un certain nombre d’outils de maintenance et d’utilisation. Cette base, qui comporte aujourd’hui 440.000 formes du Français contemporain, est destinée à être diffusée et remise à jour régulièrement. Nous exposons d’abord les outils et les techniques employées pour sa constitution et son enrichissement, notamment la technique de calcul des fréquences lexicales par catégorie morphosyntaxique. Nous décrivons ensuite différentes approches pour constituer un sous-lexique de taille réduite, dont la particularité est de couvrir plus de 90% de l’usage. Un tel lexique noyau offre en outre la possibilité d’être réellement complété manuellement avec des informations sémantiques, de valence, pragmatiques etc.
Cet article propose l’introduction d’une notion de densité syntaxique permettant de caractériser la complexité d’un énoncé et au-delà d’introduire la spécification d’un gradient de grammaticalité. Un tel gradient s’avère utile dans plusieurs cas : quantification de la difficulté d’interprétation d’une phrase, gradation de la quantité d’information syntaxique contenue dans un énoncé, explication de la variabilité et la dépendances entre les domaines linguistiques, etc. Cette notion exploite la possibilité de caractérisation fine de l’information syntaxique en termes de contraintes : la densité est fonction des contraintes satisfaites par une réalisation pour une grammaire donnée. Les résultats de l’application de cette notion à quelques corpus sont analysés.
This paper presents a technique for the representation and the implementation of interaction relations between different domains of linguistic analysis. This solution relies on the localization of the linguistic objects in the context. The relations are then implemented by means of interaction constraints, each domain information being expressed independently.
Cet article fournit des éléments d’explication pour la description des relations entre les différents domaines de l’analyse linguistique. Il propose une architecture générale en vue d’une théorie formée de plusieurs niveaux : d’un côté les grammaires de chacun des domaines et de l’autre des relations spécifiant les interactions entre ces domaines. Dans cette approche, chacun des domaines est porteur d’une partie de l’information, celle-ci résultant également de l’interaction entre les domaines.
Nous présentons dans cet article un cadre d’explication des relations entre les différents composants de l’analyse linguistique (prosodie, syntaxe, sémantique, etc.). Nous proposons un principe spécifiant un équilibre pour un objet linguistique donné entre ces différents composants sous la forme d’un poids (précisant l’aspect marqué de l’objet décrit) défini pour chacun d’entre eux et d’un seuil (correspondant à la somme de ces poids) à atteindre. Une telle approche permet d’expliquer certains phénomènes de variabilité : le choix d’une “tournure” à l’intérieur d’un des composants peut varier à condition que son poids n’empêche pas d’atteindre le seuil spécifié. Ce type d’information, outre son intérêt purement linguistique, constitue le premier élément de réponse pour l’introduction de la variabilité dans des applications comme les systèmes de génération ou de synthèse de la parole.
Cet article propose une description des dépendances à distances s’appuyant sur une approche totalement déclarative, les grammaires de propriétés, décrivant l’information linguistique sous la forme de contraintes. L’approche décrite ici consiste à introduire de façon dynamique en cours d’analyse de nouvelles contraintes, appelées propriétés distantes. Cette notion est illustrée par la description du phénomène des disloquées en français.
In this paper, we propose a disambiguating technique called controlled disjunctions. This extension of the so-called named disjunctions relies on the relations existing between feature values (covariation, control, etc.). We show that controlled disjunctions can implement different kind of ambiguities in a consistent and homogeneous way. We describe the integration of controlled disjunctions into a HPSG feature structure representation. Finally, we present a direct implementation by means of delayed evaluation and we develop an example within the functional programming paradigm.