Marie-Aude Lefer


Vers une analyse des différences interlinguistiques entre les genres textuels : étude de cas basée sur les n-grammes et l’analyse factorielle des correspondances (Towards a cross-linguistic analysis of genres: A case study based on n-grams and Correspondence Analysis)
Marie-Aude Lefer | Yves Bestgen | Natalia Grabar
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Posters)

L’objectif de notre travail est d’évaluer l’intérêt d’employer les n-grammes et l’analyse factorielle des correspondances (AFC) pour comparer les genres textuels dans les études contrastives interlinguistiques. Nous exploitons un corpus bilingue anglais-français constitué de textes originaux comparables. Le corpus réunit trois genres : les débats parlementaires européens, les éditoriaux de presse et les articles scientifiques. Dans un premier temps, les n-grammes d’une longueur de 2 à 4 mots sont extraits dans chaque langue. Ensuite, pour chaque longueur, les 1 000 n-grammes les plus fréquents dans chaque langue sont traités par l’AFC pour déterminer quels n-grammes sont particulièrement saillants dans les genres étudiés. Enfin, les n-grammes sont catégorisés manuellement en distinguant les expressions d’opinion et de certitude, les marqueurs discursifs et les expressions référentielles. Les résultats montrent que les n-grammes permettent de mettre au jour des caractéristiques typiques des genres étudiés, de même que des contrastes interlangues intéressants.


Evaluative prefixes in translation: From automatic alignment to semantic categorization
Marie-Aude Lefer | Natalia Grabar
Linguistic Issues in Language Technology, Volume 11, 2014 - Theoretical and Computational Morphology: New Trends and Synergies

This article aims to assess to what extent translation can shed light on the semantics of French evaluative prefixation by adopting No ̈el (2003)’s ‘translations as evidence for semantics’ approach. In French, evaluative prefixes can be classified along two dimensions (cf. (Fradin and Montermini 2009)): (1) a quantity dimension along a maximum/minimum axis and the semantic values big and small, and (2) a quality dimension along a positive/negative axis and the values good (excess; higher degree) and bad (lack; lower degree). In order to provide corpus-based insights into this semantic categorization, we analyze French evaluative prefixes alongside their English translation equivalents in a parallel corpus. To do so, we focus on periphrastic translations, as they are likely to ‘spell out’ the meaning of the French prefixes. The data used were extracted from the Europarl parallel corpus (Koehn 2005; Cartoni and Meyer 2012). Using a tailormade program, we first aligned the French prefixed words with the corresponding word(s) in English target sentences, before proceeding to the evaluation of the aligned sequences and the manual analysis of the bilingual data. Results confirm that translation data can be used as evidence for semantics in morphological research and help refine existing semantic descriptions of evaluative prefixes.


The MuLeXFoR Database: Representing Word-Formation Processes in a Multilingual Lexicographic Environment
Bruno Cartoni | Marie-Aude Lefer
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper introduces a new lexicographic resource, the MuLeXFoR database, which aims to present word-formation processes in a multilingual environment. Morphological items represent a real challenge for lexicography, especially for the development of multilingual tools. Affixes can take part in several word-formation rules and, conversely, rules can be realised by means of a variety of affixes. Consequently, it is often difficult to provide enough information to help users understand the meaning(s) of an affix or familiarise with the most frequent strategies used to translate the meaning(s) conveyed by affixes. In fact, traditional dictionaries often fail to achieve this goal. The database introduced in this paper tries to take advantage of recent advances in electronic implementation and morphological theory. Word-formation is presented as a set of multilingual rules that users can access via different indexes (affixes, rules and constructed words). MuLeXFoR entries contain, among other things, detailed descriptions of morphological constraints and productivity notes, which are sorely lacking in currently available tools such as bilingual dictionaries.