Aurélien Max

2018

pdf bib abs
Construction of a Multilingual Corpus Annotated with Translation Relations
Yuming Zhai | Aurélien Max | Anne Vilnat
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing

Translation relations, which distinguish literal translation from other translation techniques, constitute an important subject of study for human translators (Chuquet and Paillard, 1989). However, automatic processing techniques based on interlingual relations, such as machine translation or paraphrase generation exploiting translational equivalence, have not exploited these relations explicitly until now. In this work, we present a categorisation of translation relations and annotate them in a parallel multilingual (English, French, Chinese) corpus of oral presentations, the TED Talks. Our long term objective will be to automatically detect these relations in order to integrate them as important characteristics for the search of monolingual segments in relation of equivalence (paraphrases) or of entailment. The annotated corpus resulting from our work will be made available to the community.

2016

pdf bib
LIMSI’s Contribution to the WMT’16 Biomedical Translation Task
Julia Ive | Aurélien Max | François Yvon
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

2015

pdf bib abs
Sentence alignment for literary texts: The state-of-the-art and beyond
Yong Xu | Aurélien Max | François Yvon
Linguistic Issues in Language Technology, Volume 12, 2015 - Literature Lifts up Computational Linguistics

Literary works are becoming increasingly available in electronic formats, thus quickly transforming editorial processes and reading habits. In the context of the global enthusiasm for multilingualism, the rapid spread of e-book readers, such as Amazon Kindle R or Kobo Touch R , fosters the development of a new generation of reading tools for bilingual books. In particular, literary works, when available in several languages, offer an attractive perspective for self-development or everyday leisure reading, but also for activities such as language learning, translation or literary studies. An important issue in the automatic processing of multilingual e-books is the alignment between textual units. Alignment could help identify corresponding text units in different languages, which would be particularly beneficial to bilingual readers and translation professionals. Computing automatic alignments for literary works, however, is a task more challenging than in the case of better behaved corpora such as parliamentary proceedings or technical manuals. In this paper, we revisit the problem of computing high-quality. alignment for literary works. We first perform a large-scale evaluation of automatic alignment for literary texts, which provides a fair assessment of the actual difficulty of this task. We then introduce a two-pass approach, based on a maximum entropy model. Experimental results for novels available in English and French or in English and Spanish demonstrate the effectiveness of our method.

pdf bib
Touch-Based Pre-Post-Editing of Machine Translation Output
Benjamin Marie | Aurélien Max
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Multi-Pass Decoding With Complex Feature Guidance for Statistical Machine Translation
Benjamin Marie | Aurélien Max
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib abs
Incremental development of statistical machine translation systems
Li Gong | Aurélien Max | François Yvon
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers

Statistical Machine Translation produces results that make it a competitive option in most machine-assisted translation scenarios. However, these good results often come at a very high computational cost and correspond to training regimes which are unfit to many practical contexts, where the ability to adapt to users and domains and to continuously integrate new data (eg. in post-edition contexts) are of primary importance. In this article, we show how these requirements can be met using a strategy for on-demand word alignment and model estimation. Most remarkably, our incremental system development framework is shown to deliver top quality translation performance even in the absence of tuning, and to surpass a strong baseline when performing online tuning. All these results obtained with great computational savings as compared to conventional systems.

pdf bib
Confidence-based Rewriting of Machine Translation Output
Benjamin Marie | Aurélien Max
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Towards a More Efficient Development of Statistical Machine Translation Systems (Vers un développement plus efficace des systèmes de traduction statistique : un peu de vert dans un monde de BLEU) [in French]
Li Gong | Aurélien Max | François Yvon
Proceedings of TALN 2014 (Volume 2: Short Papers)

pdf bib
(Much) Faster Construction of SMT Phrase Tables from Large-scale Parallel Corpora (Construction (très) rapide de tables de traduction à partir de grands bi-textes) [in French]
Li Gong | Aurélien Max | François Yvon
Proceedings of TALN 2014 (Volume 3: System Demonstrations)

2013

pdf bib abs
Improving bilingual sub-sentential alignment by sampling-based transpotting
Li Gong | Aurélien Max | François Yvon
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers

In this article, we present a sampling-based approach to improve bilingual sub-sentential alignment in parallel corpora. This approach can be used to align parallel sentences on an as needed basis, and is able to accurately align newly available sentences. We evaluate the resulting alignments on several Machine Translation tasks. Results show that for the tasks considered here, our approach performs on par with the state-of-the-art statistical alignment pipeline giza++/Moses, and obtains superior results in a number of configurations, notably when aligning additional parallel sentence pairs carefully selected to match the test input.

pdf bib abs
A study in greedy oracle improvement of translation hypotheses
Benjamin Marie | Aurélien Max
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers

This paper describes a study of translation hypotheses that can be obtained by iterative, greedy oracle improvement from the best hypothesis of a state-of-the-art phrase-based Statistical Machine Translation system. The factors that we consider include the influence of the rewriting operations, target languages, and training data sizes. Analysis of our results provide new insights into some previously unanswered questions, which include the reachability of previously unreachable hypotheses via indirect translation (thanks to the introduction of a rewrite operation on the source text), and the potential translation performance of systems relying on pruned phrase tables.

pdf bib
LIMSI’s participation to the 2013 shared task on Native Language Identification
Thomas Lavergne | Gabriel Illouz | Aurélien Max | Ryo Nagata
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

2012

pdf bib abs
Towards contextual adaptation for any-text translation
Li Gong | Aurélien Max | François Yvon
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers

Adaptation for Machine Translation has been studied in a variety of ways, using an ideal scenario where the training data can be split into ”out-of-domain” and ”in-domain” corpora, on which the adaptation is based. In this paper, we consider a more realistic setting which does not assume the availability of any kind of ”in-domain” data, hence the name ”any-text translation”. In this context, we present a new approach to contextually adapt a translation model onthe-fly, and present several experimental results where this approach outperforms conventionaly trained baselines. We also present a document-level contrastive evaluation whose results can be easily interpreted, even by non-specialists.

pdf bib
Étude bilingue de l’acquisition et de la validation automatiques de paraphrases sous-phrastiques [Automatic acquisition and validation of sub-sentential paraphrases : a bilingual study]
Houda Bouamor | Aurélien Max | Anne Vilnat
Traitement Automatique des Langues, Volume 53, Numéro 1 : Varia [Varia]

pdf bib
Generalizing Sub-sentential Paraphrase Acquisition across Original Signal Type of Text Pairs
Aurélien Max | Houda Bouamor | Anne Vilnat
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Validation of sub-sentential paraphrases acquired from parallel monolingual corpora
Houda Bouamor | Aurélien Max | Anne Vilnat
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Extraction d’information automatique en domaine médical par projection inter-langue : vers un passage à l’échelle (Automatic Information Extraction in the Medical Domain by Cross-Lingual Projection) [in French]
Asma Ben Abacha | Pierre Zweigenbaum | Aurélien Max
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib
Validation sur le Web de reformulations locales: application à la Wikipédia (Assisted Rephrasing for Wikipedia Contributors through Web-based Validation) [in French]
Houda Bouamor | Aurélien Max | Gabriel Illouz | Anne Vilnat
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib
Une étude en 3D de la paraphrase: types de corpus, langues et techniques (A Study of Paraphrase along 3 Dimensions : Corpus Types, Languages and Techniques) [in French]
Houda Bouamor | Aurélien Max | Anne Vilnat
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib abs
A contrastive review of paraphrase acquisition techniques
Houda Bouamor | Aurélien Max | Gabriel Illouz | Anne Vilnat
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper addresses the issue of what approach should be used for building a corpus of sententential paraphrases depending on one's requirements. Six strategies are studied: (1) multiple translations into a single language from another language; (2) multiple translations into a single language from different other languages; (3) multiple descriptions of short videos; (4) multiple subtitles for the same language; (5) headlines for similar news articles; and (6) sub-sentential paraphrasing in the context of a Web-based game. We report results on French for 50 paraphrase pairs collected for all these strategies, where corpora were manually aligned at the finest possible level to define oracle performance in terms of accessible sub-sentential paraphrases. The differences observed will be used as criteria for motivating the choice of a given approach before attempting to build a new paraphrase corpus.

pdf bib
Aligning Bilingual Literary Works: a Pilot Study
Qian Yu | Aurélien Max | François Yvon
Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature

pdf bib
WSD for n-best reranking and local language modeling in SMT
Marianna Apidianaki | Guillaume Wisniewski | Artem Sokolov | Aurélien Max | François Yvon
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

2011

pdf bib abs
Paraphrases et modifications locales dans l’historique des révisions de Wikipédia (Paraphrases and local changes in the revision history of Wikipedia)
Camille Dutrey | Houda Bouamor | Delphine Bernhard | Aurélien Max
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans cet article, nous analysons les modifications locales disponibles dans l’historique des révisions de la version française de Wikipédia. Nous définissons tout d’abord une typologie des modifications fondée sur une étude détaillée d’un large corpus de modifications. Puis, nous détaillons l’annotation manuelle d’une partie de ce corpus afin d’évaluer le degré de complexité de la tâche d’identification automatique de paraphrases dans ce genre de corpus. Enfin, nous évaluons un outil d’identification de paraphrases à base de règles sur un sous-ensemble de notre corpus.

pdf bib abs
Combinaison d’informations pour l’alignement monolingue (Information combination for monolingual alignment)
Houda Bouamor | Aurélien Max | Anne Vilnat
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans cet article, nous décrivons une nouvelle méthode d’alignement automatique de paraphrases d’énoncés. Nous utilisons des méthodes développées précédemment afin de produire différentes approches hybrides (hybridations). Ces différentes méthodes permettent d’acquérir des équivalences textuelles à partir d’un corpus monolingue parallèle. L’hybridation combine des informations obtenues par diverses techniques : alignements statistiques, approche symbolique, fusion d’arbres syntaxiques et alignement basé sur des distances d’édition. Nous avons évalué l’ensemble de ces résultats et nous constatons une amélioration sur l’acquisition de paraphrases sous-phrastiques.

pdf bib
Monolingual Alignment by Edit Rate Computation on Sentential Paraphrase Pairs
Houda Bouamor | Aurélien Max | Anne Vilnat
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Web-based Validation for Contextual Targeted Paraphrasing
Houda Bouamor | Aurélien Max | Gabriel Illouz | Anne Vilnat
Proceedings of the Workshop on Monolingual Text-To-Text Generation

2010

pdf bib abs
Multi-pivot translation by system combination
Gregor Leusch | Aurélien Max | Josep Maria Crego | Hermann Ney
Proceedings of the 7th International Workshop on Spoken Language Translation: Papers

This paper describes a technique to exploit multiple pivot languages when using machine translation (MT) on language pairs with scarce bilingual resources, or where no translation system for a language pair is available. The principal idea is to generate intermediate translations in several pivot languages, translate them separately into the target language, and generate a consensus translation out of these using MT system combination techniques. Our technique can also be applied when a translation system for a language pair is available, but is limited in its translation accuracy because of scarce resources. Using statistical MT systems for the 11 different languages of Europarl, we show experimentally that a direct translation system can be replaced by this pivot approach without a loss in translation quality if about six pivot languages are available. Furthermore, we can already improve an existing MT system by adding two pivot systems to it. The maximum improvement was found to be 1.4% abs. in BLEU in our experiments for 8 or more pivot languages.

pdf bib abs
Recueil et analyse d’un corpus écologique de corrections orthographiques extrait des révisions de Wikipédia
Guillaume Wisniewski | Aurélien Max | François Yvon
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans cet article, nous introduisons une méthode à base de règles permettant d’extraire automatiquement de l’historique des éditions de l’encyclopédie collaborative Wikipédia des corrections orthographiques. Cette méthode nous a permis de construire un corpus d’erreurs composé de 72 483 erreurs lexicales (non-word errors) et 74 100 erreurs grammaticales (real-word errors). Il n’existe pas, à notre connaissance, de plus gros corpus d’erreurs écologiques librement disponible. En outre, les techniques mises en oeuvre peuvent être facilement transposées à de nombreuses autres langues. La collecte de ce corpus ouvre de nouvelles perspectives pour l’étude des erreurs fréquentes ainsi que l’apprentissage et l’évaluation des correcteurs orthographiques automatiques. Plusieurs expériences illustrant son intérêt sont proposées.

pdf bib abs
Acquisition de paraphrases sous-phrastiques depuis des paraphrases d’énoncés
Houda Bouamor | Aurélien Max | Anne Vilnat
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Dans cet article, nous présentons la tâche d’acquisition de paraphrases sous-phrastiques (impliquant des paires de mots ou de groupes de mots), et décrivons plusieurs techniques opérant à différents niveaux. Nous décrivons une évaluation visant à comparer ces techniques et leurs combinaisons sur deux corpus de paraphrases d’énoncés obtenus par traduction multiple. Les conclusions que nous tirons peuvent servir de guide pour améliorer des techniques existantes.

pdf bib
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues
Alexandre Patry | Philippe Langlais | Aurélien Max
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues

pdf bib
Micro-adaptation lexicale en traduction automatique statistique [Lexical Micro-adaptation in Statistical Machine Translation]
Josep Maria Crego | Gregor Leusch | Aurélien Max | Hermann Ney | François Yvon
Traitement Automatique des Langues, Volume 51, Numéro 2 : Multilinguisme et traitement automatique des langues [Multilingualism and Natural Language Processing]

pdf bib
Local lexical adaptation in Machine Translation through triangulation: SMT helping SMT
Josep Maria Crego | Aurélien Max | François Yvon
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Example-Based Paraphrasing for Improved Phrase-Based Statistical Machine Translation
Aurélien Max
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib abs
Contrastive Lexical Evaluation of Machine Translation
Aurélien Max | Josep Maria Crego | François Yvon
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper advocates a complementary measure of translation performance that focuses on the constrastive ability of two or more systems or system versions to adequately translate source words. This is motivated by three main reasons : 1) existing automatic metrics sometimes do not show significant differences that can be revealed by fine-grained focussed human evaluation, 2) these metrics are based on direct comparisons between system hypotheses with the corresponding reference translations, thus ignoring the input words that were actually translated, and 3) as these metrics do not take input hypotheses from several systems at once, fine-grained contrastive evaluation can only be done indirectly. This proposal is illustrated on a multi-source Machine Translation scenario where multiple translations of a source text are available. Significant gains (up to +1.3 BLEU point) are achieved on these experiments, and contrastive lexical evaluation is shown to provide new information that can help to better analyse a system's performance.

pdf bib abs
Mining Naturally-occurring Corrections and Paraphrases from Wikipedia’s Revision History
Aurélien Max | Guillaume Wisniewski
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Naturally-occurring instances of linguistic phenomena are important both for training and for evaluating automatic text processing. When available in large quantities, they also prove interesting material for linguistic studies. In this article, we present WiCoPaCo (Wikipedia Correction and Paraphrase Corpus), a new freely-available resource built by automatically mining Wikipedias revision history. The WiCoPaCo corpus focuses on local modifications made by human revisors and include various types of corrections (such as spelling error or typographical corrections) and rewritings, which can be categorized broadly into meaning-preserving and meaning-altering revisions. We present an initial hand-built typology of these revisions, but the resource allows for any possible annotation scheme. We discuss the main motivations for building such a resource and describe the main technical details guiding its construction. We also present applications and data analysis on French and report initial results on spelling error correction and morphosyntactic rewriting. The WiCoPaCo corpus can be freely downloaded from http://wicopaco.limsi.fr.

2009

pdf bib abs
Prise en compte de dépendances syntaxiques pour la traduction contextuelle de segments
Aurélien Max | Rafik Maklhoufi | Philippe Langlais
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans un système standard de traduction statistique basé sur les segments, le score attribué aux différentes traductions d’un segment ne dépend pas du contexte dans lequel il apparaît. Plusieurs travaux récents tendent à montrer l’intérêt de prendre en compte le contexte source lors de la traduction, mais ces études portent sur des systèmes traduisant vers l’anglais, une langue faiblement fléchie. Dans cet article, nous décrivons nos expériences sur la prise en compte du contexte source dans un système statistique traduisant de l’anglais vers le français, basé sur l’approche proposée par Stroppa et al. (2007). Nous étudions l’impact de différents types d’indices capturant l’information contextuelle, dont des dépendances syntaxiques typées. Si les mesures automatiques d’évaluation de la qualité d’une traduction ne révèlent pas de gains significatifs de notre système par rapport à un système à l’état de l’art ne faisant pas usage du contexte, une évaluation manuelle conduite sur 100 phrases choisies aléatoirement est en faveur de notre système. Cette évaluation fait également ressortir que la prise en compte de certaines dépendances syntaxiques est bénéfique à notre système.

pdf bib abs
Plusieurs langues (bien choisies) valent mieux qu’une : traduction statistique multi-source par renforcement lexical
Josep Maria Crego | Aurélien Max | François Yvon
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Les systèmes de traduction statistiques intègrent différents types de modèles dont les prédictions sont combinées, lors du décodage, afin de produire les meilleures traductions possibles. Traduire correctement des mots polysémiques, comme, par exemple, le mot avocat du français vers l’anglais (lawyer ou avocado), requiert l’utilisation de modèles supplémentaires, dont l’estimation et l’intégration s’avèrent complexes. Une alternative consiste à tirer parti de l’observation selon laquelle les ambiguïtés liées à la polysémie ne sont pas les mêmes selon les langues source considérées. Si l’on dispose, par exemple, d’une traduction vers l’espagnol dans laquelle avocat a été traduit par aguacate, alors la traduction de ce mot vers l’anglais n’est plus ambiguë. Ainsi, la connaissance d’une traduction français!espagnol permet de renforcer la sélection de la traduction avocado pour le système français!anglais. Dans cet article, nous proposons d’utiliser des documents en plusieurs langues pour renforcer les choix lexicaux effectués par un système de traduction automatique. En particulier, nous montrons une amélioration des performances sur plusieurs métriques lorsque les traductions auxiliaires utilisées sont obtenues manuellement.

pdf bib abs
Amener des utilisateurs à créer et évaluer des paraphrases par le jeu
Houda Bouamor | Aurélien Max | Anne Vilnat
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations

Dans cet article, nous présentons une application sur le web pour l’acquisition de paraphrases phrastiques et sous-phrastiques sous forme de jeu. L’application permet l’acquisition à la fois de paraphrases et de jugements humains multiples sur ces paraphrases, ce qui constitue des données particulièrement utiles pour les applications du TAL basées sur les phénomènes paraphrastiques.

pdf bib
LIMSI‘s Statistical Translation Systems for WMT‘09
Alexandre Allauzen | Josep Crego | Aurélien Max | François Yvon
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
Sub-sentencial Paraphrasing by Contextual Pivot Translation
Aurélien Max
Proceedings of the 2009 Workshop on Applied Textual Inference (TextInfer)

2008

pdf bib
Explorations in using grammatical dependencies for contextual phrase translation disambiguation
Aurélien Max | Rafik Makhloufi | Philippe Langlais
Proceedings of the 12th Annual Conference of the European Association for Machine Translation

pdf bib abs
Génération de reformulations locales par pivot pour l’aide à la révision
Aurélien Max
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article présente une approche pour obtenir des paraphrases pour de courts segments de texte qui peuvent aider un rédacteur à reformuler localement des textes. La ressource principale utilisée est une table d’alignements bilingues de segments d’un système de traduction automatique statistique. Un segment marqué par le rédacteur est tout d’abord traduit dans une langue pivot avant d’être traduit à nouveau dans la langue d’origine, ce qui est permis par la nature même de la ressource bilingue utilisée sans avoir recours à un processus de traduction complet. Le cadre proposé permet l’intégration et la combinaison de différents modèles d’estimation de la qualité des paraphrases. Des modèles linguistiques tentant de prendre en compte des caractéristiques des paraphrases de courts segments de textes sont proposés, et une évaluation est décrite et ses résultats analysés. Les domaines d’application possibles incluent, outre l’aide à la reformulation, le résumé et la réécriture des textes pour répondre à des conventions ou à des préférences stylistiques. L’approche est critiquée et des perspectives d’amélioration sont proposées.

pdf bib abs
An Evaluation of Spoken and Textual Interaction in the RITEL Interactive Question Answering System
Dave Toney | Sophie Rosset | Aurélien Max | Olivier Galibert | Eric Bilinski
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The RITEL project aims to integrate a spoken language dialogue system and an open-domain information retrieval system in order to enable human users to ask a general question and to refine their search for information interactively. This type of system is often referred to as an Interactive Question Answering (IQA) system. In this paper, we present an evaluation of how the performance of the RITEL system differs when users interact with it using spoken versus textual input and output. Our results indicate that while users do not perceive the two versions to perform significantly differently, many more questions are asked in a typical text-based dialogue.

pdf bib
Looking up phrase rephrasings via a pivot language
Aurélien Max | Michael Zock
Coling 2008: Proceedings of the Workshop on Cognitive Aspects of the Lexicon (COGALEX 2008)

2006

pdf bib
Contraindre le fond et la forme en domaine contraint: la normalisation de documents [Constraining form and content in a constrained domain: document normalization]
Aurélien Max
Traitement Automatique des Langues, Volume 47, Numéro 2 : Discours et document : traitements automatiques [Computational Approaches to Discourse and Document Processing]

2005

pdf bib abs
Simplification interactive pour la production de textes adaptés aux personnes souffrant de troubles de la compréhension
Aurélien Max
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Cet article traite du problème de la compréhensibilité des textes et en particulier du besoin de simplifier la complexité syntaxique des phrases pour des lecteurs souffrant de troubles de la compréhension. Nous présentons une approche à base de règles de simplification développées manuellement et son intégration dans un traitement de texte. Cette intégration permet la validation interactive de simplifications candidates produites par le système, et lie la tâche de création de texte simplifié à celle de rédaction.

La problématique de la normalisation de documents est introduite et illustrée par des exemples issus de notices pharmaceutiques. Un paradigme pour l’analyse du contenu des documents est proposé. Ce paradigme se base sur la spécification formelle de la sémantique des documents et utilise une notion de similarité floue entre les prédictions textuelles d’un générateur de texte et le texte du document à analyser. Une implémentation initiale du paradigme est présentée.