2021
pdf
abs
SpanAlign: Efficient Sequence Tagging Annotation Projection into Translated Data applied to Cross-Lingual Opinion Mining
Léo Jacqmin
|
Gabriel Marzinotto
|
Justyna Gromada
|
Ewelina Szczekocka
|
Robert Kołodyński
|
Géraldine Damnati
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)
Following the increasing performance of neural machine translation systems, the paradigm of using automatically translated data for cross-lingual adaptation is now studied in several applicative domains. The capacity to accurately project annotations remains however an issue for sequence tagging tasks where annotation must be projected with correct spans. Additionally, when the task implies noisy user-generated text, the quality of translation and annotation projection can be affected. In this paper we propose to tackle multilingual sequence tagging with a new span alignment method and apply it to opinion target extraction from customer reviews. We show that provided suitable heuristics, translated data with automatic span-level annotation projection can yield improvements both for cross-lingual adaptation compared to zero-shot transfer, and data augmentation compared to a multilingual baseline.
2020
pdf
abs
Cross-lingual and Cross-domain Evaluation of Machine Reading Comprehension with Squad and CALOR-Quest Corpora
Delphine Charlet
|
Geraldine Damnati
|
Frederic Bechet
|
Gabriel Marzinotto
|
Johannes Heinecke
Proceedings of the Twelfth Language Resources and Evaluation Conference
Machine Reading received recently a lot of attention thanks to both the availability of very large corpora such as SQuAD or MS MARCO containing triplets (document, question, answer), and the introduction of Transformer Language Models such as BERT which obtain excellent results, even matching human performance according to the SQuAD leaderboard. One of the key features of Transformer Models is their ability to be jointly trained across multiple languages, using a shared subword vocabulary, leading to the construction of cross-lingual lexical representations. This feature has been used recently to perform zero-shot cross-lingual experiments where a multilingual BERT model fine-tuned on a machine reading comprehension task exclusively for English was directly applied to Chinese and French documents with interesting performance. In this paper we study the cross-language and cross-domain capabilities of BERT on a Machine Reading Comprehension task on two corpora: SQuAD and a new French Machine Reading dataset, called CALOR-QUEST. The semantic annotation available on CALOR-QUEST allows us to give a detailed analysis on the kinds of questions that are properly handled through the cross-language process. We will try to answer this question: which factor between language mismatch and domain mismatch has the strongest influence on the performances of a Machine Reading Comprehension task?
pdf
abs
FrameNet Annotations Alignment using Attention-based Machine Translation
Gabriel Marzinotto
Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet
This paper presents an approach to project FrameNet annotations into other languages using attention-based neural machine translation (NMT) models. The idea is to use an NMT encoder-decoder attention matrix to propose a word-to-word correspondence between the source and the target language. We combine this word alignment along with a set of simple rules to securely project the FrameNet annotations into the target language. We successfully implemented, evaluated and analyzed this technique on the English-to-French configuration. First, we analyze the obtained FrameNet lexicon qualitatively. Then, we use existing French FrameNet corpora to assert the quality of the translation. Finally, we trained a BERT-based FrameNet parser using the projected annotations and compared it to a BERT baseline. Results show substantial improvements in the French language, giving evidence to support that our approach could help to propagate FrameNet data-set on other languages.
pdf
abs
Analyse automatique en cadres sémantiques pour l’apprentissage de modèles de compréhension de texte (Semantic Frame Parsing for training Machine Reading Comprehension models)
Gabriel Marzinotto
|
Delphine Charlet
|
Géraldine Damnati
|
Frédéric Béchet
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles
Dans le cadre de la compréhension automatique de documents, cet article propose une évaluation intrinsèque et extrinsèque d’un modèle d’analyse automatique en cadres sémantiques (Frames). Le modèle proposé est un modèle état de l’art à base de GRU bi-directionnel, enrichi par l’utilisation d’embeddings contextuels. Nous montrons qu’un modèle de compréhension de documents appris sur un corpus de triplets générés à partir d’un corpus analysé automatiquement avec l’analyseur en cadre sémantique présente des performances inférieures de seulement 2.5% en relatif par rapport à un modèle appris sur un corpus de triplets générés à partir d’un corpus analysé manuellement.
pdf
abs
Analyse sémantique robuste par apprentissage antagoniste pour la généralisation de domaine (Robust Semantic Parsing with Adversarial Learning for Domain Generalization )
Gabriel Marzinotto
|
Géraldine Damnati
|
Frédéric Béchet
|
Benoît Favre
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 4 : Démonstrations et résumés d'articles internationaux
Nous présentons des résumés en français et en anglais de l’article (Marzinotto et al., 2019) présenté à la conférence North American Chapter of the Association for Computational Linguistics : Human Language Technologies en 2019.
2019
pdf
abs
MaskParse@Deskin at SemEval-2019 Task 1: Cross-lingual UCCA Semantic Parsing using Recursive Masked Sequence Tagging
Gabriel Marzinotto
|
Johannes Heinecke
|
Géraldine Damnati
Proceedings of the 13th International Workshop on Semantic Evaluation
This paper describes our recursive system for SemEval-2019 Task 1: Cross-lingual Semantic Parsing with UCCA. Each recursive step consists of two parts. We first perform semantic parsing using a sequence tagger to estimate the probabilities of the UCCA categories in the sentence. Then, we apply a decoding policy which interprets these probabilities and builds the graph nodes. Parsing is done recursively, we perform a first inference on the sentence to extract the main scenes and links and then we recursively apply our model on the sentence using a masking features that reflects the decisions made in previous steps. Process continues until the terminal nodes are reached. We chose a standard neural tagger and we focus on our recursive parsing strategy and on the cross lingual transfer problem to develop a robust model for the French language, using only few training samples
pdf
abs
Robust Semantic Parsing with Adversarial Learning for Domain Generalization
Gabriel Marzinotto
|
Géraldine Damnati
|
Frédéric Béchet
|
Benoît Favre
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)
This paper addresses the issue of generalization for Semantic Parsing in an adversarial framework. Building models that are more robust to inter-document variability is crucial for the integration of Semantic Parsing technologies in real applications. The underlying question throughout this study is whether adversarial learning can be used to train models on a higher level of abstraction in order to increase their robustness to lexical and stylistic variations. We propose to perform Semantic Parsing with a domain classification adversarial task, covering various use-cases with or without explicit knowledge of the domain. The strategy is first evaluated on a French corpus of encyclopedic documents, annotated with FrameNet, in an information retrieval perspective. This corpus constitutes a new public benchmark, gathering documents from various thematic domains and various sources. We show that adversarial learning yields improved results when using explicit domain classification as the adversarial task. We also propose an unsupervised domain discovery approach that yields equivalent improvements. The latter is also evaluated on a PropBank Semantic Role Labeling task on the CoNLL-2005 benchmark and is shown to increase the model’s generalization capabilities on out-of-domain data.
2018
pdf
Semantic Frame Parsing for Information Extraction : the CALOR corpus
Gabriel Marzinotto
|
Jeremy Auguste
|
Frederic Bechet
|
Geraldine Damnati
|
Alexis Nasr
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2017
pdf
abs
Analyse automatique FrameNet : une étude sur un corpus français de textes encyclopédiques (FrameNet automatic analysis : a study on a French corpus of encyclopedic texts)
Gabriel Marzinotto
|
Géraldine Damnati
|
Frédéric Béchet
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. Volume 2 - Articles courts
Cet article présente un système d’analyse automatique en cadres sémantiques évalué sur un corpus de textes encyclopédiques d’histoire annotés selon le formalisme FrameNet. L’approche choisie repose sur un modèle intégré d’étiquetage de séquence qui optimise conjointement l’identification des cadres, la segmentation et l’identification des rôles sémantiques associés. Nous cherchons dans cette étude à analyser la complexité de la tâche selon plusieurs dimensions. Une analyse détaillée des performances du système est ainsi proposée, à la fois selon l’angle des paramètres du modèle et de la nature des données.