This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
RoxaneBertrand
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Despite growing interest in discourse-related tasks, the limited quantity and diversity of discourse-annotated data remain a major issue. Existing resources are largely based on written corpora, while spoken conversational genres are underrepresented. Although discourse segmentation into elementary discourse units (EDUs) is considered to be nearly solved for canonical written texts, conversational spontaneous speech transcripts present different challenges. In this paper, we introduce a large French corpus of segmented meeting dialogues, including 20 hours of manually transcribed and discourse-annotated conversations, and 80 hours of automatically transcribed and discourse-segmented data. We describe our annotation campaign, discuss inter-annotator agreement and segmentation guidelines, and present results from fine-tuning a model for EDU segmentation on this resource.
We present the MEETING corpus, a dataset of roughly 95 hours of spontaneous meeting-style conversations in French. The corpus is designed to serve as a foundation for downstream tasks such as meeting summarization. In its current state, it offers 25 hours of manually corrected transcripts that are aligned with the audio signal, making it a valuable resource for evaluating ASR and speaker recognition systems. It also includes automatic transcripts and alignments of the whole corpus which can be used for downstream NLP tasks. The aim of this paper is to describe the conception, production and annotation of the corpus up to the transcription level as well as to provide statistics that shed light on the main linguistic features of the corpus.
In the realm of human communication, feedback plays a pivotal role in shaping the dynamics of conversations. This study delves into the multifaceted relationship between listener feedback, narration quality and distraction effects. We present an analysis conducted on the SMYLE corpus, specifically enriched for this study, where 30 dyads of participants engaged in 1) face-to-face storytelling (8.2 hours) followed by 2) a free conversation (7.8 hours). The storytelling task unfolds in two conditions, where a storyteller engages with either a “normal” or a “distracted” listener. Examining the feedback impact on storytellers, we discover a positive correlation between the frequency of specific feedback and the narration quality in normal conditions, providing an encouraging conclusion regarding the enhancement of interaction through specific feedback in distraction-free settings. In contrast, in distracted settings, a negative correlation emerges, suggesting that increased specific feedback may disrupt narration quality, underscoring the complexity of feedback dynamics in human communication. The contribution of this paper is twofold: first presenting a new and highly enriched resource for the analysis of discourse phenomena in controlled and normal conditions; second providing new results on feedback production, its form and its consequence on the discourse quality (with direct applications in human-machine interaction).
The aim of this study is to investigate conversational feedbacks that contain smiles and laughs. Firstly, we propose a statistical analysis of smiles and laughs used as generic and specific feedbacks in a corpus of French talk-in-interaction. Our results show that smiles of low intensity are preferentially used to produce generic feedbacks while high intensity smiles and laughs are preferentially used to produce specific feedbacks. Secondly, based on a machine learning approach, we propose a hierarchical classification of feedback to automatically predict not only the presence/absence of a smile but, also the type of smiles according to an intensity-scale (low or high).
This paper presents an original dataset of controlled interactions, focusing on the study of feedback items. It consists on recordings of different conversations between a doctor and a patient, played by actors. In this corpus, the patient is mainly a listener and produces different feedbacks, some of them being (voluntary) incongruent. Moreover, these conversations have been re-synthesized in a virtual reality context, in which the patient is played by an artificial agent. The final corpus is made of different movies of human-human conversations plus the same conversations replayed in a human-machine context, resulting in the first human-human/human-machine parallel corpus. The corpus is then enriched with different multimodal annotations at the verbal and non-verbal levels. Moreover, and this is the first dataset of this type, we have designed an experiment during which different participants had to watch the movies and give an evaluation of the interaction. During this task, we recorded participant’s brain signal. The Brain-IHM dataset is then conceived with a triple purpose: 1/ studying feedbacks by comparing congruent vs. incongruent feedbacks 2/ comparing human-human and human-machine production of feedbacks 3/ studying the brain basis of feedback perception.
This paper presents a quantitative description of laughter in height 1-hour French spontaneous conversations. The paper includes the raw figures for laughter as well as more details concerning inter-individual variability. It firstly describes to what extent the amount of laughter and their durations varies from speaker to speaker in all dialogs. In a second suite of analyses, this paper compares our corpus with previous analyzed corpora. In a final set of experiments, it presents some facts about overlapping laughs. This paper have quantified these all effects in free-style conversations, for the first time.
There have been several attempts to annotate communicative functions to utterances of verbal feedback in English previously. Here, we suggest an annotation scheme for verbal and non-verbal feedback utterances in French including the categories base, attitude, previous and visual. The data comprises conversations, maptasks and negotiations from which we extracted ca. 13,000 candidate feedback utterances and gestures. 12 students were recruited for the annotation campaign of ca. 9,500 instances. Each instance was annotated by between 2 and 7 raters. The evaluation of the annotation agreement resulted in an average best-pair kappa of 0.6. While the base category with the values acknowledgement, evaluation, answer, elicit achieve good agreement, this is not the case for the other main categories. The data sets, which also include automatic extractions of lexical, positional and acoustic features, are freely available and will further be used for machine learning classification experiments to analyse the form-function relationship of feedback.
This paper investigates the discursive phenomenon called other-repetitions (OR), particularly in the context of spontaneous French dialogues. It focuses on their automatic detection and characterization. A method is proposed to retrieve automatically OR: this detection is based on rules that are applied on the lexical material only. This automatic detection process has been used to label other-repetitions on 8 dialogues of CID - Corpus of Interactional Data. Evaluations performed on one speaker are good with a F1-measure of 0.85. Retrieved OR occurrences are then statistically described: number of words, distance, etc.
This paper addresses the problem of the enrichment of transcriptions in the perspective of an automatic phonetization. Phonetization is the process of representing sounds with phonetic signs. There are two general ways to construct a phonetization process: rule based systems (with rules based on inference approaches or proposed by expert linguists) and dictionary based solutions which consist in storing a maximum of phonological knowledge in a lexicon. In both cases, phonetization is based on a manual transcription. Such a transcription is established on the basis of conventions that can differ depending on their working out context. This present study focuses on three different enrichments of such a transcription. Evaluations compare phonetizations obtained from automatic systems to a reference phonetized manually. The test corpus is made of three types of speech: conversational speech, read speech and political debate. A specific algorithm for the rule-based system is proposed to deal with enrichments. The final system obtained a phonetization of about 95.2% correct (from 3.7% to 5.6% error rates depending on the corpus).
This paper presents the outline and performance of an automatic syllable boundary detection system. The syllabification of phonemes is performed with a rule-based system, implemented in a Java program. Phonemes are categorized into 6 classes. A set of specific rules are developed and categorized as general rules which can be applied in all cases, and exception rules which are applied in some specific situations. These rules deal with a French spontaneous speech corpus. Moreover, the proposed phonemes, classes and rules are listed in an external configuration file of the tool (under GPL licence) that make the tool very easy to adapt to a specific corpus by adding or modifying rules, phoneme encoding or phoneme classes, by the use of a new configuration file. Finally, performances are evaluated and compared to 3 other French syllabification systems and show significant improvements. Automatic system output and expert's syllabification are in agreement for most of syllable boundaries in our corpus.
Large annotation projects, typically those addressing the question of multimodal annotation in which many different kinds of information have to be encoded, have to elaborate precise and high level annotation schemes. Doing this requires first to define the structure of the information: the different objects and their organization. This stage has to be as much independent as possible from the coding language constraints. This is the reason why we propose a preliminary formal annotation model, represented with typed feature structures. This representation requires a precise definition of the different objects, their properties (or features) and their relations, represented in terms of type hierarchies. This approach has been used to specify the annotation scheme of a large multimodal annotation project (OTIM) and experimented in the annotation of a multimodal corpus (CID, Corpus of Interactional Data). This project aims at collecting, annotating and exploiting a dialogue video corpus in a multimodal perspective (including speech and gesture modalities). The corpus itself, is made of 8 hours of dialogues, fully transcribed and richly annotated (phonetics, syntax, pragmatics, gestures, etc.).
The paper presents a project of the Laboratoire Parole & Langage which aims at collecting, annotating and exploiting a corpus of spoken French in a multimodal perspective. The project directly meets the present needs in linguistics where a growing number of researchers become aware of the fact that a theory of communication which aims at describing real interactions should take into account the complexity of these interactions. However, in order to take into account such a complexity, linguists should have access to spoken corpora annotated in different fields. The paper presents the annotation schemes used in phonetics, morphology and syntax, prosody, gestuality at the LPL together with the type of linguistic description made from the annotations seen in two examples.