Martial Pastor


2024

pdf
La reconnaissance automatique des relations de cohérence RST en français.
Martial Pastor | Erik Bran Marino | Nelleke Oostdijk
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position

Les parseurs de discours ont suscité un intérêt considérable dans les récentes applications de traitement automatique du langage naturel. Cette approche dépasse les limites traditionnelles de la phrase et peut s’étendre pour englober l’identification de relation de discours. Il existe plusieurs parseurs spécialisés dans le traitement autmatique du discours, mais ces derniers ont été principalement évalués sur des corpus anglais. Par conséquent, il n’est pas évident de bien cerner les éléments linguistiques importants sur lesquels les parseurs se basent pour classifier les relations de discours en dehors de l’anglais. Cet article évalue les performances du parseur DMRST sur le corpus RST-DT traduit en français. Nous constatons que les performances de classification des relations de discours en français sont comparables à celles obtenues pour d’autres langues. En analysant les succès et échecs de la classification des relations, nous soulignons l’impact des marqueurs de discours et des structures syntaxiques sur la précision du parseur.

pdf
Signals as Features: Predicting Error/Success in Rhetorical Structure Parsing
Martial Pastor | Nelleke Oostdijk
Proceedings of the 5th Workshop on Computational Approaches to Discourse (CODI 2024)

This study introduces an approach for evaluating the importance of signals proposed by Das and Taboada in discourse parsing. Previous studies using other signals indicate that discourse markers (DMs) are not consistently reliable cues and can act as distractors, complicating relations recognition. The study explores the effectiveness of alternative signal types, such as syntactic and genre-related signals, revealing their efficacy even when not predominant for specific relations. An experiment incorporating RST signals as features for a parser error / success prediction model demonstrates their relevance and provides insights into signal combinations that prevents (or facilitates) accurate relation recognition. The observations also identify challenges and potential confusion posed by specific signals. This study resulted in producing publicly available code and data, contributing to an accessible resources for research on RST signals in discourse parsing.

2023

pdf
EvoSem: A database of polysemous cognate sets
Mathieu Dehouck | Alex François | Siva Kalyan | Martial Pastor | David Kletz
Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change

Polysemies, or “colexifications”, are of great interest in cognitive and historical linguistics, since meanings that are frequently expressed by the same lexeme are likely to be conceptually similar, and lie along a common pathway of semantic change. We argue that these types of inferences can be more reliably drawn from polysemies of cognate sets (which we call “dialexifications”) than from polysemies of lexemes. After giving a precise definition of dialexification, we introduce Evosem, a cross-linguistic database of etymologies scraped from several online sources. Based on this database, we measure for each pair of senses how many cognate sets include them both — i.e. how often this pair of senses is “dialexified”. This allows us to construct a weighted dialexification graph for any set of senses, indicating the conceptual and historical closeness of each pair. We also present an online interface for browsing our database, including graphs and interactive tables. We then discuss potential applications to NLP tasks and to linguistic research.