Nicholas Asher

Also published as: Nicolas Asher

2019

pdf bib abs
Analyse faiblement supervisée de conversation en actes de dialogue (Weakly supervised dialog act analysis)
Catherine Thompson | Nicholas Asher | Philippe Muller | Jérémy Auguste
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

Nous nous intéressons ici à l’analyse de conversation par chat dans un contexte orienté-tâche avec un conseiller technique s’adressant à un client, où l’objectif est d’étiqueter les énoncés en actes de dialogue, pour alimenter des analyses des conversations en aval. Nous proposons une méthode légèrement supervisée à partir d’heuristiques simples, de quelques annotations de développement, et une méthode d’ensemble sur ces règles qui sert à annoter automatiquement un corpus plus large de façon bruitée qui peut servir d’entrainement à un modèle supervisé. Nous comparons cette approche à une approche supervisée classique et montrons qu’elle atteint des résultats très proches, à un coût moindre et tout en étant plus facile à adapter à de nouvelles données.

pdf abs
Apprentissage faiblement supervisé de la structure discursive (Learning discourse structure using weak supervision )
Sonia Badene | Catherine Thompson | Nicholas Asher | Jean-Pierre Lorré
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

L’avènement des techniques d’apprentissage automatique profond a fait naître un besoin énorme de données d’entraînement. De telles données d’entraînement sont extrêmement coûteuses à créer, surtout lorsqu’une expertise dans le domaine est requise. L’une de ces tâches est l’apprentissage de la structure sémantique du discours, tâche très complexe avec des structures récursives avec des données éparses, mais qui est essentielle pour extraire des informations sémantiques profondes du texte. Nous décrivons nos expérimentations sur l’attachement des unités discursives pour former une structure, en utilisant le paradigme du data programming dans lequel peu ou pas d’annotations sont utilisées pour construire un ensemble de données d’entraînement “bruité”. Le corpus de dialogues utilisé illustre des contraintes à la fois linguistiques et non-linguistiques intéressantes qui doivent être apprises. Nous nous concentrons sur la structure des règles utilisées pour construire un modèle génératif et montrons la compétitivité de notre approche par rapport à l’apprentissage supervisé classique.

pdf abs
Data Programming for Learning Discourse Structure
Sonia Badene | Kate Thompson | Jean-Pierre Lorré | Nicholas Asher
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

This paper investigates the advantages and limits of data programming for the task of learning discourse structure. The data programming paradigm implemented in the Snorkel framework allows a user to label training data using expert-composed heuristics, which are then transformed via the “generative step” into probability distributions of the class labels given the training candidates. These results are later generalized using a discriminative model. Snorkel’s attractive promise to create a large amount of annotated data from a smaller set of training data by unifying the output of a set of heuristics has yet to be used for computationally difficult tasks, such as that of discourse attachment, in which one must decide where a given discourse unit attaches to other units in a text in order to form a coherent discourse structure. Although approaching this problem using Snorkel requires significant modifications to the structure of the heuristics, we show that weak supervision methods can be more than competitive with classical supervised learning approaches to the attachment problem.

pdf abs
Weak Supervision for Learning Discourse Structure
Sonia Badene | Kate Thompson | Jean-Pierre Lorré | Nicholas Asher
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

This paper provides a detailed comparison of a data programming approach with (i) off-the-shelf, state-of-the-art deep learning architectures that optimize their representations (BERT) and (ii) handcrafted-feature approaches previously used in the discourse analysis literature. We compare these approaches on the task of learning discourse structure for multi-party dialogue. The data programming paradigm offered by the Snorkel framework allows a user to label training data using expert-composed heuristics, which are then transformed via the “generative step” into probability distributions of the class labels given the data. We show that on our task the generative model outperforms both deep learning architectures as well as more traditional ML approaches when learning discourse structure—it even outperforms the combination of deep learning methods and hand-crafted features. We also implement several strategies for “decoding” our generative model output in order to improve our results. We conclude that weak supervision methods hold great promise as a means for creating and improving data sets for discourse structure.

2018

pdf bib abs
A Dependency Perspective on RST Discourse Parsing and Evaluation
Mathieu Morey | Philippe Muller | Nicholas Asher
Computational Linguistics, Volume 44, Issue 2 - June 2018

Computational text-level discourse analysis mostly happens within Rhetorical Structure Theory (RST), whose structures have classically been presented as constituency trees, and relies on data from the RST Discourse Treebank (RST-DT); as a result, the RST discourse parsing community has largely borrowed from the syntactic constituency parsing community. The standard evaluation procedure for RST discourse parsers is thus a simplified variant of PARSEVAL, and most RST discourse parsers use techniques that originated in syntactic constituency parsing. In this article, we isolate a number of conceptual and computational problems with the constituency hypothesis. We then examine the consequences, for the implementation and evaluation of RST discourse parsers, of adopting a dependency perspective on RST structures, a view advocated so far only by a few approaches to discourse parsing. While doing that, we show the importance of the notion of headedness of RST structures. We analyze RST discourse parsing as dependency parsing by adapting to RST a recent proposal in syntactic parsing that relies on head-ordered dependency trees, a representation isomorphic to headed constituency trees. We show how to convert the original trees from the RST corpus, RST-DT, and their binarized versions used by all existing RST parsers to head-ordered dependency trees. We also propose a way to convert existing simple dependency parser output to constituent trees. This allows us to evaluate and to compare approaches from both constituent-based and dependency-based perspectives in a unified framework, using constituency and dependency metrics. We thus propose an evaluation framework to compare extant approaches easily and uniformly, something the RST parsing community has lacked up to now. We can also compare parsers’ predictions to each other across frameworks. This allows us to characterize families of parsing strategies across the different frameworks, in particular with respect to the notion of headedness. Our experiments provide evidence for the conceptual similarities between dependency parsers and shift-reduce constituency parsers, and confirm that dependency parsing constitutes a viable approach to RST discourse parsing.

2017

pdf bib
Proceedings of the IWCS workshop on Foundations of Situated and Multimodal Communication
Nicholas Asher | Julie Hunter | Alex Lascarides
Proceedings of the IWCS workshop on Foundations of Situated and Multimodal Communication

pdf abs
How much progress have we made on RST discourse parsing? A replication study of recent results on the RST-DT
Mathieu Morey | Philippe Muller | Nicholas Asher
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This article evaluates purported progress over the past years in RST discourse parsing. Several studies report a relative error reduction of 24 to 51% on all metrics that authors attribute to the introduction of distributed representations of discourse units. We replicate the standard evaluation of 9 parsers, 5 of which use distributed representations, from 8 studies published between 2013 and 2017, using their predictions on the test set of the RST-DT. Our main finding is that most recently reported increases in RST discourse parser performance are an artefact of differences in implementations of the evaluation procedure. We evaluate all these parsers with the standard Parseval procedure to provide a more accurate picture of the actual RST discourse parsers performance in standard evaluation settings. Under this more stringent procedure, the gains attributable to distributed representations represent at most a 16% relative error reduction on fully-labelled structures.

2016

pdf abs
Parallel Discourse Annotations on a Corpus of Short Texts
Manfred Stede | Stergos Afantenos | Andreas Peldszus | Nicholas Asher | Jérémy Perret
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present the first corpus of texts annotated with two alternative approaches to discourse structure, Rhetorical Structure Theory (Mann and Thompson, 1988) and Segmented Discourse Representation Theory (Asher and Lascarides, 2003). 112 short argumentative texts have been analyzed according to these two theories. Furthermore, in previous work, the same texts have already been annotated for their argumentation structure, according to the scheme of Peldszus and Stede (2013). This corpus therefore enables studies of correlations between the two accounts of discourse structure, and between discourse and argumentation. We converted the three annotation formats to a common dependency tree format that enables to compare the structures, and we describe some initial findings.

pdf abs
Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
Nicholas Asher | Julie Hunter | Mathieu Morey | Benamara Farah | Stergos Afantenos
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes the STAC resource, a corpus of multi-party chats annotated for discourse structure in the style of SDRT (Asher and Lascarides, 2003; Lascarides and Asher, 2009). The main goal of the STAC project is to study the discourse structure of multi-party dialogues in order to understand the linguistic strategies adopted by interlocutors to achieve their conversational goals, especially when these goals are opposed. The STAC corpus is not only a rich source of data on strategic conversation, but also the first corpus that we are aware of that provides full discourse structures for multi-party dialogues. It has other remarkable features that make it an interesting resource for other topics: interleaved threads, creative language, and interactions between linguistic and extra-linguistic contexts.

pdf
Integer Linear Programming for Discourse Parsing
Jérémy Perret | Stergos Afantenos | Nicholas Asher | Mathieu Morey
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Integrating Type Theory and Distributional Semantics: A Case Study on Adjective–Noun Compositions
Nicholas Asher | Tim Van de Cruys | Antoine Bride | Márta Abrusán
Computational Linguistics, Volume 42, Issue 4 - December 2016

2012

This paper describes the ANNODIS resource, a discourse-level annotated corpus for French. The corpus combines two perspectives on discourse: a bottom-up approach and a top-down approach. The bottom-up view incrementally builds a structure from elementary discourse units, while the top-down view focuses on the selective annotation of multi-level discourse structures. The corpus is composed of texts that are diversified with respect to genre, length and type of discursive organisation. The methodology followed here involves an iterative design of annotation guidelines in order to reach satisfactory inter-annotator agreement levels. This allows us to raise a few issues relevant for the comparison of such complex objects as discourse structures. The corpus also serves as a source of empirical evidence for discourse theories. We present here two first analyses taking advantage of this new annotated corpus --one that tested hypotheses on constraints governing discourse structure, and another that studied the variations in composition and signalling of multi-level discourse structures.

pdf
Extraction de préférences à partir de dialogues de négociation (Towards Preference Extraction From Negotiation Dialogues) [in French]
Anaïs Cadilhac | Farah Benamara | Vladimir Popescu | Nicholas Asher | Mohamadou Seck
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf
Annotating Preferences in Chats for Strategic Games
Anaïs Cadilhac | Nicholas Asher | Farah Benamara
Proceedings of the Sixth Linguistic Annotation Workshop

pdf bib
How do Negation and Modality Impact on Opinions?
Farah Benamara | Baptiste Chardon | Yannick Mathieu | Vladimir Popescu | Nicholas Asher
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics

pdf
Annotating Preferences in Negotiation Dialogues
Anaïs Cadilhac | Nicholas Asher | Farah Benamara
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf
Constrained Decoding for Text-Level Discourse Parsing
Philippe Muller | Stergos Afantenos | Pascal Denis | Nicholas Asher
Proceedings of COLING 2012

2011

pdf
Le corpus ANNODIS, un corpus enrichi d’annotations discursives [The ANNODIS corpus, a corpus enriched with discourse annotations]
Marie-Paule Péry-Woodley | Stergos D. Afantenos | Lydia-Mai Ho-Dac | Nicholas Asher
Traitement Automatique des Langues 2011 Volume 52 Numéro 3

pdf bib
Theorie et Praxis Une optique sur les travaux en TAL sur le discours et le dialogue (Theory and Praxis A view on the NLP works in discourse and dialogue)
Nicholas Asher
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Conférences invitées

pdf
Commitments to Preferences in Dialogue
Anais Cadilhac | Nicholas Asher | Farah Benamara | Alex Lascarides
Proceedings of the SIGDIAL 2011 Conference

2010

pdf bib
Testing SDRT’s Right Frontier
Stergos Afantenos | Nicholas Asher
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2009

Le projet ANNODIS vise la construction d’un corpus de textes annotés au niveau discursif ainsi que le développement d’outils pour l’annotation et l’exploitation de corpus. Les annotations adoptent deux points de vue complémentaires : une perspective ascendante part d’unités de discours minimales pour construire des structures complexes via un jeu de relations de discours ; une perspective descendante aborde le texte dans son entier et se base sur des indices pré-identifiés pour détecter des structures discursives de haut niveau. La construction du corpus est associée à la création de deux interfaces : la première assiste l’annotation manuelle des relations et structures discursives en permettant une visualisation du marquage issu des prétraitements ; une seconde sera destinée à l’exploitation des annotations. Nous présentons les modèles et protocoles d’annotation élaborés pour mettre en oeuvre, au travers de l’interface dédiée, la campagne d’annotation.