Yves Lepage

2021

pdf bib
Covering a sentence in form and meaning with fewer retrieved sentences
Yuan Liu | Yves Lepage
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

pdf bib abs
EM Corpus: a comparable corpus for a less-resourced language pair Manipuri-English
Rudali Huidrom | Yves Lepage | Khogendra Khomdram
Proceedings of the 14th Workshop on Building and Using Comparable Corpora (BUCC 2021)

In this paper, we introduce a sentence-level comparable text corpus crawled and created for the less-resourced language pair, Manipuri(mni) and English (eng). Our monolingual corpora comprise 1.88 million Manipuri sentences and 1.45 million English sentences, and our parallel corpus comprises 124,975 Manipuri-English sentence pairs. These data were crawled and collected over a year from August 2020 to March 2021 from a local newspaper website called ‘The Sangai Express.’ The resources reported in this paper are made available to help the low-resourced languages community for MT/NLP tasks.

2020

pdf bib abs
Réseaux de neurones pour la résolution d’analogies entre phrases en traduction automatique par l’exemple (Neural networks for the resolution of analogies between sentences in EBMT )
Valentin Taillandier | Liyan Wang | Yves Lepage
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles

Cet article propose un modèle de réseau de neurones pour la résolution d’équations analogiques au niveau sémantique et entre phrases dans le cadre de la traduction automatique par l’exemple. Son originalité réside dans le fait qu’il fusionne les deux approches, directe et indirecte, de la traduction par l’exemple.

pdf bib abs
Video-to-HamNoSys Automated Annotation System
Victor Skobov | Yves Lepage
Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives

The Hamburg Notation System (HamNoSys) was developed for movement annotation of any sign language (SL) and can be used to produce signing animations for a virtual avatar with the JASigning platform. This provides the potential to use HamNoSys, i.e., strings of characters, as a representation of an SL corpus instead of video material. Processing strings of characters instead of images can significantly contribute to sign language research. However, the complexity of HamNoSys makes it difficult to annotate without a lot of time and effort. Therefore annotation has to be automatized. This work proposes a conceptually new approach to this problem. It includes a new tree representation of the HamNoSys grammar that serves as a basis for the generation of grammatical training data and classification of complex movements using machine learning. Our automatic annotation system relies on HamNoSys grammar structure and can potentially be used on already existing SL corpora. It is retrainable for specific settings such as camera angles, speed, and gestures. Our approach is conceptually different from other SL recognition solutions and offers a developed methodology for future research.

pdf bib abs
Zero-shot translation among Indian languages
Rudali Huidrom | Yves Lepage
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages

Standard neural machine translation (NMT) allows a model to perform translation between a pair of languages. Multilingual neural machine translation (NMT), on the other hand, allows a model to perform translation between several language pairs, even between language pairs for which no sentences pair has been seen during training (zero-shot translation). This paper presents experiments with zero-shot translation on low resource Indian languages with a very small amount of data for each language pair. We first report results on balanced data over all considered language pairs. We then expand our experiments for additional three rounds by increasing the training data with 2,000 sentence pairs in each round for some of the language pairs. We obtain an increase in translation accuracy with its balanced data settings score multiplied by 7 for Manipuri to Hindi during Round-III of zero-shot translation.

2018

pdf bib
IPS-WASEDA system at CoNLL–SIGMORPHON 2018 Shared Task on morphological inflection
Rashel Fam | Yves Lepage
Proceedings of the CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection

pdf bib
Context Encoder for Analogies on Strings
Tianjing Zhao | Yves Lepage
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

pdf bib
Korean L2 Vocabulary Prediction: Can a Large Annotated Corpus be Used to Train Better Models for Predicting Unknown Words?
Kevin Yancey | Yves Lepage
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Tools for The Production of Analogical Grids and a Resource of N-gram Analogical Grids in 11 Languages
Rashel Fam | Yves Lepage
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib abs
CHARCUT: Human-Targeted Character-Based MT Evaluation with Loose Differences
Adrien Lardilleux | Yves Lepage
Proceedings of the 14th International Conference on Spoken Language Translation

We present CHARCUT, a character-based machine translation evaluation metric derived from a human-targeted segment difference visualisation algorithm. It combines an iterative search for longest common substrings between the candidate and the reference translation with a simple length-based threshold, enabling loose differences that limit noisy character matches. Its main advantage is to produce scores that directly reflect human-readable string differences, making it a useful support tool for the manual analysis of MT output and its display to end users. Experiments on WMT16 metrics task data show that it is on par with the best “un-trained” metrics in terms of correlation with human judgement, well above BLEU and TER baselines, on both system and segment tasks.

pdf bib
Unsupervised Bilingual Segmentation using MDL for Machine Translation
Bin Shan | Hao Wang | Yves Lepage
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

pdf bib
BTG-based Machine Translation with Simple Reordering Model using Structured Perceptron
Hao Wang | Yves Lepage
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

2016

pdf bib
Extraction of Bilingual Technical Terms for Chinese-Japanese Patent Translation
Wei Yang | Jinghui Yan | Yves Lepage
Proceedings of the NAACL Student Research Workshop

pdf bib abs
Combining fast_align with Hierarchical Sub-sentential Alignment for Better Word Alignments
Hao Wang | Yves Lepage
Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6)

fast align is a simple and fast word alignment tool which is widely used in state-of-the-art machine translation systems. It yields comparable results in the end-to-end translation experiments of various language pairs. However, fast align does not perform as well as GIZA++ when applied to language pairs with distinct word orders, like English and Japanese. In this paper, given the lexical translation table output by fast align, we propose to realign words using the hierarchical sub-sentential alignment approach. Experimental results show that simple additional processing improves the performance of word alignment, which is measured by counting alignment matches in comparison with fast align. We also report the result of final machine translation in both English-Japanese and Japanese-English. We show our best system provided significant improvements over the baseline as measured by BLEU and RIBES.

pdf bib abs
Improving Patent Translation using Bilingual Term Extraction and Re-tokenization for Chinese–Japanese
Wei Yang | Yves Lepage
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

Unlike European languages, many Asian languages like Chinese and Japanese do not have typographic boundaries in written system. Word segmentation (tokenization) that break sentences down into individual words (tokens) is normally treated as the first step for machine translation (MT). For Chinese and Japanese, different rules and segmentation tools lead different segmentation results in different level of granularity between Chinese and Japanese. To improve the translation accuracy, we adjust and balance the granularity of segmentation results around terms for Chinese–Japanese patent corpus for training translation model. In this paper, we describe a statistical machine translation (SMT) system which is built on re-tokenized Chinese-Japanese patent training corpus using extracted bilingual multi-word terms.

pdf bib
HSSA tree structures for BTG-based preordering in machine translation
Yujia Zhang | Hao Wang | Yves Lepage
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers

pdf bib
Yet Another Symmetrical and Real-time Word Alignment Method: Hierarchical Sub-sentential Alignment using F-measure
Hao Wang | Yves Lepage
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers

2015

pdf bib
Sampling-based Alignment and Hierarchical Sub-sentential Alignment in Chinese–Japanese Translation of Patents
Wei Yang | Zhongwen Zhao | Baosong Yang | Yves Lepage
Proceedings of the 2nd Workshop on Asian Translation (WAT2015)

pdf bib
Translation of Unseen Bigrams by Analogy Using an SVM Classifier
Hao Wang | Lu Lyu | Yves Lepage
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

pdf bib
Chinese Word Segmentation based on analogy and majority voting
Zongrong Zheng | Yi Wang | Yves Lepage
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters

2014

pdf bib abs
Production of Phrase Tables in 11 European Languages using an Improved Sub-sentential Aligner
Juan Luo | Yves Lepage
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper is a partial report of an on-going Kakenhi project which aims to improve sub-sentential alignment and release multilingual syntactic patterns for statistical and example-based machine translation. Here we focus on improving a sub-sentential aligner which is an instance of the association approach. Phrase table is not only an essential component in the machine translation systems but also an important resource for research and usage in other domains. As part of this project, all phrase tables produced in the experiments will also be made freely available.

pdf bib
Measuring Similarity from Word Pair Matrices with Syntagmatic and Paradigmatic Associations
Jin Matsuoka | Yves Lepage
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)

pdf bib
Consistent Improvement in Translation Quality of Chinese-Japanese Technical Texts by Adding Additional Quasi-parallel Training Data
Wei Yang | Yves Lepage
Proceedings of the 1st Workshop on Asian Translation (WAT2014)

pdf bib
Testing Distributional Hypothesis in Patent Translation
Hsin-Hung Lin | Yves Lepage
Proceedings of the 26th Conference on Computational Linguistics and Speech Processing (ROCLING 2014)

Nous montrons dans une série d’expériences sur quatre langues, sur des échantillons du corpus Europarl, que, dans leur grande majorité, les trigrammes inconnus d’un jeu de test peuvent être reconstruits par analogie avec des trigrammes hapax du corpus d’entraînement. De ce résultat, nous dérivons une méthode de lissage simple pour les modèles de langue par trigrammes et obtenons de meilleurs résultats que les lissages de Witten-Bell, Good-Turing et Kneser-Ney dans des expériences menées en onze langues sur la partie commune d’Europarl, sauf pour le finnois et, dans une moindre mesure, le français.

pdf bib abs
Généralisation de l’alignement sous-phrastique par échantillonnage (Generalization of sub-sentential alignment by sampling)
Adrien Lardilleux | François Yvon | Yves Lepage
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

L’alignement sous-phrastique consiste à extraire des traductions d’unités textuelles de grain inférieur à la phrase à partir de textes multilingues parallèles alignés au niveau de la phrase. Un tel alignement est nécessaire, par exemple, pour entraîner des systèmes de traduction statistique. L’approche standard pour réaliser cette tâche implique l’estimation successive de plusieurs modèles probabilistes de complexité croissante et l’utilisation d’heuristiques qui permettent d’aligner des mots isolés, puis, par extension, des groupes de mots. Dans cet article, nous considérons une approche alternative, initialement proposée dans (Lardilleux & Lepage, 2008), qui repose sur un principe beaucoup plus simple, à savoir la comparaison des profils d’occurrences dans des souscorpus obtenus par échantillonnage. Après avoir analysé les forces et faiblesses de cette approche, nous montrons comment améliorer la détection d’unités de traduction longues, et évaluons ces améliorations sur des tâches de traduction automatique.

pdf bib abs
Évaluation de G-LexAr pour la traduction automatique statistique (Evaluation of G-Lexar for statistical machine translation)
Wigdan Mekki | Julien Gosme | Fathi Debili | Yves Lepage | Nadine Lucas
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

G-LexAr est un analyseur morphologique de l’arabe qui a récemment reçu des améliorations substantielles. Cet article propose une évaluation de cet analyseur en tant qu’outil de pré-traitement pour la traduction automatique statistique, ce dont il n’a encore jamais fait l’objet. Nous étudions l’impact des différentes formes proposées par son analyse (voyellation, lemmatisation et segmentation) sur un système de traduction arabe-anglais, ainsi que l’impact de la combinaison de ces formes. Nos expériences montrent que l’utilisation séparée de chacune de ces formes n’a que peu d’influence sur la qualité des traductions obtenues, tandis que leur combinaison y contribue de façon très bénéfique.

pdf bib
Improving Sampling-based Alignment by Investigating the Distribution of N-grams in Phrase Translation Tables
Juan Luo | Adrien Lardilleux | Yves Lepage
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

pdf bib
Fully-Automatic Marker-based Chunking in 11 European Languages and Counts of the Number of Analogies between Chunks
Kota Takeya | Yves Lepage
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

2010

pdf bib abs
Bilingual Lexicon Induction: Effortless Evaluation of Word Alignment Tools and Production of Resources for Improbable Language Pairs
Adrien Lardilleux | Julien Gosme | Yves Lepage
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we present a simple protocol to evaluate word aligners on bilingual lexicon induction tasks from parallel corpora. Rather than resorting to gold standards, it relies on a comparison of the outputs of word aligners against a reference bilingual lexicon. The quality of this reference bilingual lexicon does not need to be particularly high, because evaluation quality is ensured by systematically filtering this reference lexicon with the parallel corpus the word aligners are trained on. We perform a comparison of three freely available word aligners on numerous language pairs from the Bible parallel corpus (Resnik et al., 1999): MGIZA++ (Gao and Vogel, 2008), BerkeleyAligner (Liang et al., 2006), and Anymalign (Lardilleux and Lepage, 2009). We then select the most appropriate one to produce bilingual lexicons for all language pairs of this corpus. These involve Cebuano, Chinese, Danish, English, Finnish, French, Greek, Indonesian, Latin, Spanish, Swedish, and Vietnamese. The 66 resulting lexicons are made freely available.

pdf bib abs
The GREYC/LLACAN machine translation systems for the IWSLT 2010 campaign
Julien Gosme | Wigdan Mekki | Fathi Debili | Yves Lepage | Nadine Lucas
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper we explore the contribution of the use of two Arabic morphological analyzers as preprocessing tools for statistical machine translation. Similar investigations have already been reported for morphologically rich languages like German, Turkish and Arabic. Here, we focus on the case of the Arabic language and mainly discuss the use of the G-LexAr analyzer. A preliminary experiment has been designed to choose the most promising translation system among the 3 G-LexAr-based systems, we concluded that the systems are equivalent. Nevertheless, we decided to use the lemmatized output of G-LexAr and use its translations as primary run for the BTEC AE track. The results showed that G-LexAr outputs degrades translation compared to the basic SMT system trained on the un-analyzed corpus.

pdf bib
The True Score of Statistical Paraphrase Generation
Jonathan Chevelu | Ghislain Putois | Yves Lepage
Coling 2010: Posters

pdf bib abs
L’évaluation des paraphrases : pour une prise en compte de la tâche
Jonathan Chevelu | Yves Lepage | Thierry Moudenc | Ghislain Putois
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Les définitions des paraphrases privilégient généralement la conservation du sens. Cet article démontre par l’absurde qu’une évaluation uniquement basée sur la conservation du sens permet à un système inutile de production de paraphrase d’être jugé meilleur qu’un système au niveau de l’état de l’art. La conservation du sens n’est donc pas l’unique critère des paraphrases. Nous exhibons les trois objectifs des paraphrases : la conservation du sens, la naturalité et l’adaptation à la tâche. La production de paraphrase est alors un compromis dépendant de la tâche entre ces trois critères et ceux-ci doivent être pris en compte lors des évaluations.

2009

pdf bib abs
anymalign : un outil d’alignement sous-phrastique libre pour les êtres humains
Adrien Lardilleux | Yves Lepage
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations

Nous présentons anymalign, un aligneur sous-phrastique grand public. Ses résultats ont une qualité qui rivalise avec le meilleur outil du domaine, GIZA++. Il est rapide et simple d’utilisation, et permet de produire dictionnaires et autres tables de traduction en une seule commande. À notre connaissance, c’est le seul outil au monde permettant d’aligner un nombre quelconque de langues simultanément. Il s’agit donc du premier aligneur sousphrastique réellement multilingue.

pdf bib
Introduction of a new paraphrase generation tool based on Monte-Carlo sampling
Jonathan Chevelu | Thomas Lavergne | Yves Lepage | Thierry Moudenc
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf bib
Towards automatic acquisition of linguistic features
Yves Lepage | Chooi Ling Goh
Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009)

pdf bib
Sampling-based Multilingual Alignment
Adrien Lardilleux | Yves Lepage
Proceedings of the International Conference RANLP-2009

pdf bib abs
The GREYC translation memory for the IWSLT 2009 evaluation campaign
Yves Lepage | Adrien Lardilleux | Julien Gosme
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign

This year’s GREYC translation system is an improved translation memory that was designed from scratch to experiment with an approach whose goal is just to improve over the output of a standard translation memory by making heavy use of sub-sentential alignments in a restricted case of translation by analogy. The tracks the system participated in are all BTEC tracks: Arabic to English, Chinese to English, and Turkish to English.

2008

pdf bib
Multilingual Alignments by Monolingual String Differences
Adrien Lardilleux | Yves Lepage
Coling 2008: Companion volume: Posters

pdf bib abs
The GREYC machine translation system for the IWSLT 2008 evaluation campaign.
Yves Lepage | Adrien Lardilleux | Julien Gosme | Jean-Luc Manguin
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign

This year's GREYC machine translation (MT) system presents three major changes relative to the system presented during the previous campaign, while, of course, remaining a pure example-based MT system that exploits proportional analogies. Firstly, the analogy solver has been replaced with a truly non-deterministic one. Secondly, the engine has been re-engineered and a better control has been introduced. Thirdly, the data used for translation were the data provided by the organizers plus alignments obtained using a new alignment method. This year we chose to have the engine run with the word as the processing unit on the contrary to previous years where the processing unit used to be the character. The tracks the system participated in are all classic BTEC tracks (Arabic-English, Chinese-English and Chinese-Spanish) plus the so-called PIVOT task, where the test set had to be translated from Chinese into Spanish by way of English.

pdf bib abs
A truly multilingual, high coverage, accurate, yet simple, subsentential alignment method
Adrien Lardilleux | Yves Lepage
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers

This paper describes a new alignment method that extracts high quality multi-word alignments from sentence-aligned multilingual parallel corpora. The method can handle several languages at once. The phrase tables obtained by the method have a comparable accuracy and a higher coverage than those obtained by current methods. They are also obtained much faster.

2007

pdf bib abs
The GREYC machine translation system for the IWSLT 2007 evaluation campaign
Yves Lepage | Adrien Lardilleux
Proceedings of the Fourth International Workshop on Spoken Language Translation

The GREYC machine translation (MT) system is a slight evolution of the ALEPH machine translation system that participated in the IWLST 2005 campaign. It is a pure example-based MT system that exploits proportional analogies. The training data used for this campaign were limited on purpose to the sole data provided by the organizers. However, the training data were expanded with the results of sub-sentential alignments. Thesystemparticipatedinthetwoclassicaltasks of translation of manually transcribed texts from Japanese to English and Arabic to English.

2006

pdf bib abs
Analogie en traitement automatique des langues. Application à la traduction automatique
Yves Lepage
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Tutoriels

On se place ici dans la tendance actuelle en traitement automatique des langues, celle à base de corpus et aussi dans une perspective que l’on peut qualifier d’approche à moindre effort : il s’agit d’examiner les limites des possibilités de traitement à partir de données textuelles brutes, c’est-à-dire non pré-traitées. L’interrogation théorique présente en arrière-plan est la suivante : quelles sont les opérations fondamentales en langue ? L’analogie proportionnelle a été mentionnée par de nombreux grammairiens et linguistes. On se propose de montrer l’efficacité d’une telle opération en la testant sur une tâche dure du traitement automatique des langues : la traduction automatique. On montrera aussi les bonnes conséquences de la formalisation d’une telle opération avec des résultats théoriques en théorie des langages en relation avec leur adéquation à la description des langues. De cette façon, une opération fondamentale en langue, l’analogie proportionnelle, se verra illustrée tant par ses aspects théoriques que par ses performances en pratique.

2005

pdf bib
ALEPH: an EBMT system based on the preservation of proportional analogies between sentences across languages
Yves Lepage | Etienne Denoual
Proceedings of the Second International Workshop on Spoken Language Translation

pdf bib abs
The ‘purest’ EBMT System Ever Built: No Variables, No Templates, No Training, Examples, Just Examples, Only Examples
Yves Lepage | Etienne Denoual
Workshop on example-based machine translation

We designed, implemented and assessed an EBMT system that can be dubbed the “purest ever built”: it strictly does not make any use of variables, templates or training, does not have any explicit transfer component, and does not require any preprocessing of the aligned examples. It uses a specific operation, namely proportional analogy, that implicitly neutralises divergences between languages and captures lexical and syntactical variations along the paradigmatic and syntagmatic axes without explicitly decomposing sentences into fragments. In an experiment with a test set of 510 input sentences and an unprocessed corpus of almost 160,000 aligned sentences in Japanese and English, we obtained BLEU, NIST and mWER scores of 0.53, 8.53 and 0.39 respectively, well above a baseline simulating a translation memory.

pdf bib
BLEU in Characters: Towards Automatic MT Evaluation in Languages without Word Delimiters
Etienne Denoual | Yves Lepage
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

pdf bib
Automatic generation of paraphrases to be used as translation references in objective evaluation measures of machine translation
Yves Lepage | Etienne Denoual
Proceedings of the Third International Workshop on Paraphrasing (IWP2005)

2004

pdf bib
Lower and higher estimates of the number of “true analogies” between sentences contained in a large multilingual corpus
Yves Lepage
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib abs
Using Paradigm Tables to Generate New Utterances Similar to those Existing in Linguistic Resources
Yves Lepage | Guilhem Peralta
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

We inspect the possibility of creating new linguistic utterances (small sentences) similar to those already present in an existing linguistic resource. Using paradigm tables ensures that the new generated sentences resemble previous data, while being of course different. We report an experiment in which 1,201 new correct sentences were generated starting from only 22 seed sentences.

2001

pdf bib abs
Aides à l’analyse pour la construction de banque d’arbres : étude de l’effort
Nicolas Auclerc | Yves Lepage
Actes de la 8ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

La construction de banque d’arbres est une entreprise lourde qui prend du temps. Pour faciliter cette construction, nous voyons la construction de banques d’arbres comme une série d’opérations d’édition et de recherche. Le but de cet article est d’estimer l’effort, en nombre d’opérations d’éditions, nécessaire pour ajouter une nouvelle phrase dans la banque d’arbres. Nous avons proposé un outil, Boardedit, qui inclut un éditeur d’arbres et des aides a l’analyse. Comme l’effort nécessaire dépend bien sûr de la qualité des réponses fournies par les aides a l’analyse, il peut être vue comme une mesure de la qualité de ces aides. L’éditeur d’arbres restant indispensable a notre outil pendant l’eXpérience, les aides a l’analyse seront donc toujours associées a l’éditeur d’arbres. Dans l’eXpérience proposée, nous augmentons une banque d’arbres de 5 000 phrases par l 553 nouvelles phrases. La réduction obtenue est supérieure auX 4/5 de l’effort.

pdf bib abs
Défense et illustration de l’analogie
Yves Lepage
Actes de la 8ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

L’argumentation générativiste contre l’analogie tenait en trois points: l’hypothèse de l’inné, celle du hors-contexte et la surproduction. Des résultats théoriques et expérimen- taux reposant sur une formulation calculatoire nouvelle de l’analogie contribuent de façon constructive a la réfutation de ces points.