Eric Wehrli

Also published as: Éric Wehrli

2020

pdf abs
La résolution d’anaphores au-delà de la frontière de la phrase (The Anaphora Resolution Beyond Sentence Boundary)
Luka Nerima | Eric Wehrli
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 4 : Démonstrations et résumés d'articles internationaux

Cette démonstration présente une extension de nos outils d’analyse syntaxique et d’étiquetage morphosyntaxique qui prend en compte la résolution d’anaphores pronominales non seulement à l’intérieur d’une phrase, mais également si l’antécédent se trouve dans la phrase précédente. Autant l’analyseur que l’étiqueteur effectuant une analyse syntaxique complète des phrases, ces outils affichent également les fonctions grammaticales des constituants (sujet, objet direct, etc.) et les arguments des verbes. Une version de cette démonstration est disponible sur le Web.

2017

pdf abs
Parsing and MWE Detection: Fips at the PARSEME Shared Task
Vasiliki Foufi | Luka Nerima | Éric Wehrli
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

Identifying multiword expressions (MWEs) in a sentence in order to ensure their proper processing in subsequent applications, like machine translation, and performing the syntactic analysis of the sentence are interrelated processes. In our approach, priority is given to parsing alternatives involving collocations, and hence collocational information helps the parser through the maze of alternatives, with the aim to lead to substantial improvements in the performance of both tasks (collocation identification and parsing), and in that of a subsequent task (machine translation). In this paper, we are going to present our system and the procedure that we have followed in order to participate to the open track of the PARSEME shared task on automatic identification of verbal multiword expressions (VMWEs) in running texts.

2016

pdf abs
Un outil multilingue d’extraction de collocations en ligne (This demo shows the web version of a multilingual collocation extraction tool)
Luka Nerima | Violeta Seretan | Eric Wehrli
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 5 : Démonstrations

Cette démonstration présente la version web d’un outil multilingue d’extraction de collocations. Elle est destinée aux lexicographes, aux traducteurs, aux enseignants et apprenants L2 et, plus généralement, aux linguistes désireux d’analyser et d’exploiter leurs propres corpus.

pdf abs
On-line Multilingual Linguistic Services
Eric Wehrli | Yves Scherrer | Luka Nerima
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

In this demo, we present our free on-line multilingual linguistic services which allow to analyze sentences or to extract collocations from a corpus directly on-line, or by uploading a corpus. They are available for 8 European languages (English, French, German, Greek, Italian, Portuguese, Romanian, Spanish) and can also be accessed as web services by programs. While several open systems are available for POS-tagging and dependency parsing or terminology extraction, their integration into an application requires some computational competence. Furthermore, none of the parsers/taggers handles MWEs very satisfactorily, in particular when the two terms of the collocation are distant from each other or in reverse order. Our tools, on the other hand, are specifically designed for users with no particular computational literacy. They do not require from the user any download, installation or adaptation if used on-line, and their integration in an application, using one the scripts described below is quite easy. Furthermore, by default, the parser handles collocations and other MWEs, as well as anaphora resolution (limited to 3rd person personal pronouns). When used in the tagger mode, it can be set to display grammatical functions and collocations.

2015

pdf
Rule-Based Pronominal Anaphora Treatment for Machine Translation
Sharid Loáiciga | Éric Wehrli
Proceedings of the Second Workshop on Discourse in Machine Translation

2014

pdf abs
SwissAdmin: A multilingual tagged parallel corpus of press releases
Yves Scherrer | Luka Nerima | Lorenza Russo | Maria Ivanova | Eric Wehrli
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

SwissAdmin is a new multilingual corpus of press releases from the Swiss Federal Administration, available in German, French, Italian and English. We provide SwissAdmin in three versions: (i) plain texts of approximately 6 to 8 million words per language; (ii) sentence-aligned bilingual texts for each language pair; (iii) a part-of-speech-tagged version consisting of annotations in both the Universal tagset and the richer Fips tagset, along with grammatical functions, verb valencies and collocations. The SwissAdmin corpus is freely available at www.latl.unige.ch/swissadmin.

pdf bib
Proceedings of the 10th Workshop on Multiword Expressions (MWE)
Valia Kordoni | Markus Egg | Agata Savary | Eric Wehrli | Stefan Evert
Proceedings of the 10th Workshop on Multiword Expressions (MWE)

pdf
The Relevance of Collocations for Parsing
Eric Wehrli
Proceedings of the 10th Workshop on Multiword Expressions (MWE)

2013

pdf bib
Anaphora resolution, collocations and translation
Eric Wehrli | Luka Nerima
Proceedings of the Workshop on Multi-word Units in Machine Translation and Translation Technologies

pdf
Anaphora Resolution Applied to Collocation Identification: A Preliminary Evaluation (Résolution d’anaphores appliquée aux collocations: une évaluation préliminaire) [in French]
Luka Nerima | Éric Wehrli
Proceedings of TALN 2013 (Volume 2: Short Papers)

2011

pdf abs
La traduction automatique des séquences clitiques dans un traducteur à base de règles (Automatic translation clitic sequences in a rule-based MT system)
Lorenza Russo | Éric Wehrli
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Dans cet article, nous discutons la méthodologie utilisée par Its-2, un système de traduction à base de règles, pour la traduction des pronoms clitiques. En particulier, nous nous focalisons sur les séquences clitiques, pour la traduction automatique entre le français et l’anglais. Une évaluation basée sur un corpus de phrases construites montre le potentiel de notre approche pour des traductions de bonne qualité.

pdf abs
Étude inter-langues de la distribution et des ambiguïtés syntaxiques des pronoms (A study of cross-language distribution and syntactic ambiguities of pronouns)
Lorenza Russo | Yves Scherrer | Jean-Philippe Goldman | Sharid Loáiciga | Luka Nerima | Éric Wehrli
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Ce travail décrit la distribution des pronoms selon le style de texte (littéraire ou journalistique) et selon la langue (français, anglais, allemand et italien). Sur la base d’un étiquetage morpho-syntaxique effectué automatiquement puis vérifié manuellement, nous pouvons constater que la proportion des différents types de pronoms varie selon le type de texte et selon la langue. Nous discutons les catégories les plus ambiguës de manière détaillée. Comme nous avons utilisé l’analyseur syntaxique Fips pour l’étiquetage des pronoms, nous l’avons également évalué et obtenu une précision moyenne de plus de 95%.

pdf abs
La traduction automatique des pronoms. Problèmes et perspectives (Automatic translation of pronouns. Problems and perspectives)
Yves Scherrer | Lorenza Russo | Jean-Philippe Goldman | Sharid Loáiciga | Luka Nerima | Éric Wehrli
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Dans cette étude, notre système de traduction automatique, Its-2, a fait l’objet d’une évaluation manuelle de la traduction des pronoms pour cinq paires de langues et sur deux corpus : un corpus littéraire et un corpus de communiqués de presse. Les résultats montrent que les pourcentages d’erreurs peuvent atteindre 60% selon la paire de langues et le corpus. Nous discutons ainsi deux pistes de recherche pour l’amélioration des performances de Its-2 : la résolution des ambiguïtés d’analyse et la résolution des anaphores pronominales.

pdf
FipsCoView: On-line Visualisation of Collocations Extracted from Multilingual Parallel Corpora
Violeta Seretan | Eric Wehrli
Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World

2010

pdf abs
FipsRomanian: Towards a Romanian Version of the Fips Syntactic Parser
Violeta Seretan | Eric Wehrli | Luka Nerima | Gabriela Soare
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We describe work in progress on the development of a full syntactic parser for Romanian. This work is part of a larger project of multilingual extension of the Fips parser (Wehrli, 2007), already available for French, English, German, Spanish, Italian, and Greek, to four new languages (Romanian, Romansh, Russian and Japanese). The Romanian version was built by starting with the Fips generic parsing architecture for the Romance languages and customising the grammatical component, in close relation to the development of the lexical component. We describe this process and report on preliminary results obtained for journalistic texts.

pdf abs
A Recursive Treatment of Collocations
Luka Nerima | Eric Wehrli | Violeta Seretan
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This article discusses the treatment of collocations in the context of a long-term project on the development of multilingual NLP tools. Besides classical two-word collocations, we will focus on the case of complex collocations (3 words or more) for which a recursive design is presented in the form of collocation of collocations. Although comparatively less numerous than two-word collocations, the complex collocations pose important challenges for NLP. The article discusses how these collocations are retrieved from corpora, inserted and stored in a lexical database, how the parser uses such knowledge and what are the advantages offered by a recursive approach to complex collocations.

pdf
Sentence Analysis and Collocation Identification
Eric Wehrli | Violeta Seretan | Luka Nerima
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications

2009

pdf
Deep Linguistic Multilingual Translation and Bilingual Dictionaries
Eric Wehrli | Luka Nerima | Yves Scherrer
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf
Collocations in a Rule-Based MT System: A Case Study Evaluation of their Translation Adequacy
Eric Wehrli | Violeta Seretan | Luka Nerima | Lorenza Russo
Proceedings of the 13th Annual Conference of the European Association for Machine Translation

2008

pdf abs
Generating Bilingual Dictionaries by Transitivity
Luka Nerima | Eric Wehrli
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Recently the LATL has undertaken the development of a multilingual translation system based on a symbolic parsing technology and on a transfer-based translation model. A crucial component of the system is the lexical database, notably the bilingual dictionaries containing the information for the lexical transfer from one language to another. As the number of necessary bilingual dictionaries is a quadratic function of the number of languages considered, we will face the problem of getting a large number of dictionaries. In this paper we discuss a solution to derive a bilingual dictionary by transitivity using existing ones and to check the generated translations in a parallel corpus. Our first experiments concerns the generation of two bilingual dictionaries and the quality of the entries are very promising. The number of generated entries could however be improved and we conclude the paper with the possible ways we plan to explore.

pdf abs
Traduction multilingue : le projet MulTra
Éric Wehrli | Luka Nerima
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

L’augmentation rapide des échanges et des communications pluriculturels, en particulier sur internet, intensifie les besoins d’outils multilingues y compris de traduction. Cet article décrit un projet en cours au LATL pour le développement d’un système de traduction multilingue basé sur un modèle linguistique abstrait et largement générique, ainsi que sur un modèle logiciel basé sur la notion d’objet. Les langues envisagées dans la première phase de ce projet sont l’allemand, le français, l’italien, l’espagnol et l’anglais.

2007

pdf
Fips, A “Deep” Linguistic Multilingual Parser
Eric Wehrli
ACL 2007 Workshop on Deep Linguistic Processing

pdf abs
Collocation translation based on sentence alignment and parsing
Violeta Seretan | Éric Wehrli
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Bien que de nombreux efforts aient été déployés pour extraire des collocations à partir de corpus de textes, seule une minorité de travaux se préoccupent aussi de rendre le résultat de l’extraction prêt à être utilisé dans les applications TAL qui pourraient en bénéficier, telles que la traduction automatique. Cet article décrit une méthode précise d’identification de la traduction des collocations dans un corpus parallèle, qui présente les avantages suivants : elle peut traiter des collocation flexibles (et pas seulement figées) ; elle a besoin de ressources limitées et d’un pouvoir de calcul raisonnable (pas d’alignement complet, pas d’entraînement) ; elle peut être appliquée à plusieurs paires des langues et fonctionne même en l’absence de dictionnaires bilingues. La méthode est basée sur l’information syntaxique provenant du parseur multilingue Fips. L’évaluation effectuée sur 4000 collocations de type verbe-objet correspondant à plusieurs paires de langues a montré une précision moyenne de 89.8% et une couverture satisfaisante (70.9%). Ces résultats sont supérieurs à ceux enregistrés dans l’évaluation d’autres méthodes de traduction de collocations.

2006

pdf
Accurate Collocation Extraction Using a Multilingual Parser
Violeta Seretan | Eric Wehrli
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf
TwicPen: Hand-held Scanner and Translation Software for non-Native Readers
Eric Wehrli
Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions

pdf
Multilingual Collocation Extraction: Issues and Solutions
Violeta Seretan | Eric Wehrli
Proceedings of the Workshop on Multilingual Language Resources and Interoperability

2004

pdf
Using the Web as a Corpus for the Syntactic-Based Collocation Identification
Violeta Seretan | Luka Nerima | Eric Wehrli
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf abs
Traduction, traduction de mots, traduction de phrases
Éric Wehrli
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Une des conséquences du développement d’Internet et de la globalisation des échanges est le nombre considérable d’individus amenés à consulter des documents en ligne dans une langue autre que la leur. Après avoir montré que ni la traduction automatique, ni les aides terminologiques en ligne ne constituent une réponse pleinement adéquate à ce nouveau besoin, cet article présente un système d’aide à la lecture en langue étrangère basé sur un analyseur syntaxique puissant. Pour un mot sélectionné par l’usager, ce système analyse la phrase entière, de manière (i) à choisir la lecture du mot sélectionné la mieux adaptée au contexte morphosyntaxique et (ii) à identifier une éventuelle expression idiomatique ou une collocation dont le mot serait un élément. Une démonstration de ce système, baptisé TWiC (Translation of words in context “Traduction de mots en contexte”), pourra être présentée.

2003

pdf abs
Lexical knowledge representation with contextonyms
Hyungsuk Ji | Sabine Ploux | Eric Wehrli
Proceedings of Machine Translation Summit IX: Papers

Inter-word associations like stagger - drunken, or intra-word sense divisions (e.g. write a diary vs. write an article) are difficult to compile using a traditional lexicographic approach. As an alternative, we present a model that reflects this kind of subtle lexical knowledge. Based on the minimal sense of a word (clique), the model (1) selects contextually related words (contexonyms) and (2) classifies them in a multi-dimensional semantic space. Trained on very large corpora, the model provides relevant, organized contexonyms that reflect the fine-grained connotations and contextual usage of the target word, as well as the distinct senses of homonyms and polysemous words. Further study on the neighbor effect showed that the model can handle the data sparseness problem.

pdf abs
Translation of words in context
Eric Wehrli
Proceedings of Machine Translation Summit IX: System Presentations

TWiC is an on-line word and expression translation syste m which uses a powerful parser to (i) properly identify the relevant lexical units, (ii) retrieve the base form of the selected word and (iii) recognize the presence of a multiword expression (compound, idiom, collocation) the selected word may be part of. The conjunction of state-of-the-art natural language parsing, multiword expression identification and large bilingual databases provides a powerful and effective tool for people who want to read on-line material in a foreign language which they are not completely fluent in. A full prototype version of TWiC has been completed for the English-French pair of languages.

pdf
Creating a multilingual collocations dictionary from large text corpora
Luka Nerima | Violeta Seretan | Eric Wehrli
10th Conference of the European Chapter of the Association for Computational Linguistics

Eric Wehrli

2020

2017

2016

2015

2014

2013

2011

2010

2009

2008

2007

2006

2004

2003

1998

1997

1996

1993

1992

1990

1985

Co-authors

Venues