Mathieu Morey

2019

pdf abs
ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents
Philippe Muller | Chloé Braud | Mathieu Morey
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

Segmentation is the first step in building practical discourse parsers, and is often neglected in discourse parsing studies. The goal is to identify the minimal spans of text to be linked by discourse relations, or to isolate explicit marking of discourse relations. Existing systems on English report F1 scores as high as 95%, but they generally assume gold sentence boundaries and are restricted to English newswire texts annotated within the RST framework. This article presents a generic approach and a system, ToNy, a discourse segmenter developed for the DisRPT shared task where multiple discourse representation schemes, languages and domains are represented. In our experiments, we found that a straightforward sequence prediction architecture with pretrained contextual embeddings is sufficient to reach performance levels comparable to existing systems, when separately trained on each corpus. We report performance between 81% and 96% in F1 score. We also observed that discourse segmentation models only display a moderate generalization capability, even within the same language and discourse representation scheme.

2018

pdf bib abs
A Dependency Perspective on RST Discourse Parsing and Evaluation
Mathieu Morey | Philippe Muller | Nicholas Asher
Computational Linguistics, Volume 44, Issue 2 - June 2018

Computational text-level discourse analysis mostly happens within Rhetorical Structure Theory (RST), whose structures have classically been presented as constituency trees, and relies on data from the RST Discourse Treebank (RST-DT); as a result, the RST discourse parsing community has largely borrowed from the syntactic constituency parsing community. The standard evaluation procedure for RST discourse parsers is thus a simplified variant of PARSEVAL, and most RST discourse parsers use techniques that originated in syntactic constituency parsing. In this article, we isolate a number of conceptual and computational problems with the constituency hypothesis. We then examine the consequences, for the implementation and evaluation of RST discourse parsers, of adopting a dependency perspective on RST structures, a view advocated so far only by a few approaches to discourse parsing. While doing that, we show the importance of the notion of headedness of RST structures. We analyze RST discourse parsing as dependency parsing by adapting to RST a recent proposal in syntactic parsing that relies on head-ordered dependency trees, a representation isomorphic to headed constituency trees. We show how to convert the original trees from the RST corpus, RST-DT, and their binarized versions used by all existing RST parsers to head-ordered dependency trees. We also propose a way to convert existing simple dependency parser output to constituent trees. This allows us to evaluate and to compare approaches from both constituent-based and dependency-based perspectives in a unified framework, using constituency and dependency metrics. We thus propose an evaluation framework to compare extant approaches easily and uniformly, something the RST parsing community has lacked up to now. We can also compare parsers’ predictions to each other across frameworks. This allows us to characterize families of parsing strategies across the different frameworks, in particular with respect to the notion of headedness. Our experiments provide evidence for the conceptual similarities between dependency parsers and shift-reduce constituency parsers, and confirm that dependency parsing constitutes a viable approach to RST discourse parsing.

2017

pdf abs
How much progress have we made on RST discourse parsing? A replication study of recent results on the RST-DT
Mathieu Morey | Philippe Muller | Nicholas Asher
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This article evaluates purported progress over the past years in RST discourse parsing. Several studies report a relative error reduction of 24 to 51% on all metrics that authors attribute to the introduction of distributed representations of discourse units. We replicate the standard evaluation of 9 parsers, 5 of which use distributed representations, from 8 studies published between 2013 and 2017, using their predictions on the test set of the RST-DT. Our main finding is that most recently reported increases in RST discourse parser performance are an artefact of differences in implementations of the evaluation procedure. We evaluate all these parsers with the standard Parseval procedure to provide a more accurate picture of the actual RST discourse parsers performance in standard evaluation settings. Under this more stringent procedure, the gains attributable to distributed representations represent at most a 16% relative error reduction on fully-labelled structures.

2016

pdf
Integer Linear Programming for Discourse Parsing
Jérémy Perret | Stergos Afantenos | Nicholas Asher | Mathieu Morey
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf abs
Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
Nicholas Asher | Julie Hunter | Mathieu Morey | Benamara Farah | Stergos Afantenos
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes the STAC resource, a corpus of multi-party chats annotated for discourse structure in the style of SDRT (Asher and Lascarides, 2003; Lascarides and Asher, 2009). The main goal of the STAC project is to study the discourse structure of multi-party dialogues in order to understand the linguistic strategies adopted by interlocutors to achieve their conversational goals, especially when these goals are opposed. The STAC corpus is not only a rich source of data on strategic conversation, but also the first corpus that we are aware of that provides full discourse structures for multi-party dialogues. It has other remarkable features that make it an interesting resource for other topics: interleaved threads, creative language, and interactions between linguistic and extra-linguistic contexts.

2012

pdf bib
Grew : un outil de réécriture de graphes pour le TAL (Grew: a Graph Rewriting Tool for NLP) [in French]
Bruno Guillaume | Guillame Bonfante | Paul Masson | Mathieu Morey | Guy Perrier
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 5: Software Demonstrations

2011

pdf abs
Enrichissement de structures en dépendances par réécriture de graphes (Dependency structure enrichment using graph rewriting)
Guillaume Bonfante | Bruno Guillaume | Mathieu Morey | Guy Perrier
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Nous montrons comment enrichir une annotation en dépendances syntaxiques au format du French Treebank de Paris 7 en utilisant la réécriture de graphes, en vue du calcul de sa représentation sémantique. Le système de réécriture est composé de règles grammaticales et lexicales structurées en modules. Les règles lexicales utilisent une information de contrôle extraite du lexique des verbes français Dicovalence.

pdf
Modular Graph Rewriting to Compute Semantics
Guillaume Bonfante | Bruno Guillaume | Mathieu Morey | Guy Perrier
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

2010

pdf abs
Réécriture de graphes de dépendances pour l’interface syntaxe-sémantique
Guillaume Bonfante | Bruno Guillaume | Mathieu Morey | Guy Perrier
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Nous définissons le beta-calcul, un calcul de réécriture de graphes, que nous proposons d’utiliser pour étudier les liens entre différentes représentations linguistiques. Nous montrons comment transformer une analyse syntaxique en une représentation sémantique par la composition de deux jeux de règles de beta-calcul. Le premier souligne l’importance de certaines informations syntaxiques pour le calcul de la sémantique et explicite le lien entre syntaxe et sémantique sous-spécifiée. Le second décompose la recherche de modèles pour les représentations sémantiques sous-spécifiées.