2016
pdf
bib
abs
Un outil multilingue d’extraction de collocations en ligne (This demo shows the web version of a multilingual collocation extraction tool)
Luka Nerima
|
Violeta Seretan
|
Eric Wehrli
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 5 : Démonstrations
Cette démonstration présente la version web d’un outil multilingue d’extraction de collocations. Elle est destinée aux lexicographes, aux traducteurs, aux enseignants et apprenants L2 et, plus généralement, aux linguistes désireux d’analyser et d’exploiter leurs propres corpus.
2015
pdf
bib
The ACCEPT Academic Portal: Bringing Together Pre-editing, MT and Post-editing into a Learning Environment
Pierrette Bouillon
|
Johanna Gerlach
|
Asheesh Gulati
|
Victoria Porro
|
Violeta Seretan
Proceedings of the 18th Annual Conference of the European Association for Machine Translation
pdf
bib
The ACCEPT Academic Portal: Bringing Together Pre-editing, MT and Post-editing into a Learning Environment
Pierrette Bouillon
|
Johanna Gerlach
|
Asheesh Gulati
|
Victoria Porro
|
Violeta Seretan
Proceedings of the 18th Annual Conference of the European Association for Machine Translation
2014
pdf
bib
abs
A Large-Scale Evaluation of Pre-editing Strategies for Improving User-Generated Content Translation
Violeta Seretan
|
Pierrette Bouillon
|
Johanna Gerlach
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
The user-generated content represents an increasing share of the information available today. To make this type of content instantly accessible in another language, the ACCEPT project focuses on developing pre-editing technologies for correcting the source text in order to increase its translatability. Linguistically-informed pre-editing rules have been developed for English and French for the two domains considered by the project, namely, the technical domain and the healthcare domain. In this paper, we present the evaluation experiments carried out to assess the impact of the proposed pre-editing rules on translation quality. Results from a large-scale evaluation campaign show that pre-editing helps indeed attain a better translation quality for a high proportion of the data, the difference with the number of cases where the adverse effect is observed being statistically significant. The ACCEPT pre-editing technology is freely available online and can be used in any Web-based environment to enhance the translatability of user-generated content so that it reaches a broader audience.
pdf
bib
Rule-based automatic post-processing of SMT output to reduce human post-editing effort
Victoria Porro
|
Johanna Gerlach
|
Pierrette Bouillon
|
Violeta Seretan
Proceedings of Translating and the Computer 36
pdf
bib
The ACCEPT Portal: An Online Framework for the Pre-editing and Post-editing of User-Generated Content
Violeta Seretan
|
Johann Roturier
|
David Silva
|
Pierrette Bouillon
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation
2013
bib
On translating syntactically-flexible expressions
Violeta Seretan
Proceedings of the Workshop on Multi-word Units in Machine Translation and Translation Technologies
2012
pdf
bib
abs
Acquisition of Syntactic Simplification Rules for French
Violeta Seretan
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Text simplification is the process of reducing the lexical and syntactic complexity of a text while attempting to preserve (most of) its information content. It has recently emerged as an important research area, which holds promise for enhancing the text readability for the benefit of a broader audience as well as for increasing the performance of other applications. Our work focuses on syntactic complexity reduction and deals with the task of corpus-based acquisition of syntactic simplification rules for the French language. We show that the data-driven manual acquisition of simplification rules can be complemented by the semi-automatic detection of syntactic constructions requiring simplification. We provide the first comprehensive set of syntactic simplification rules for French, whose size is comparable to similar resources that exist for English and Brazilian Portuguese. Unlike these manually-built resources, our resource integrates larger lists of lexical cues signaling simplifiable constructions, that are useful for informing practical systems.
2011
pdf
bib
abs
Une approche de résumé automatique basée sur les collocations (A Collocation-Driven Approach to Text Summarization)
Violeta Seretan
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts
Dans cet article, nous décrivons une nouvelle approche pour la création de résumés extractifs – tâche qui consiste à créer automatiquement un résumé pour un document en sélectionnant un sous-ensemble de ses phrases – qui exploite des informations collocationnelles spécifiques à un domaine, acquises préalablement à partir d’un corpus de développement. Un extracteur de collocations fondé sur l’analyse syntaxique est utilisé afin d’inférer un modèle de contenu qui est ensuite appliqué au document à résumer. Cette approche a été utilisée pour la création des versions simples pour les articles de Wikipedia en anglais, dans le cadre d’un projet visant la création automatique d’articles simplifiées, similaires aux articles recensées dans Simple English Wikipedia. Une évaluation du système développé reste encore à faire. Toutefois, les résultats préalables obtenus pour les articles sur des villes montrent le potentiel de cette approche guidée par collocations pour la sélection des phrases pertinentes.
pdf
bib
Une Suite d’interaction de fouille basée sur la compréhension du langage naturel (An Interaction Mining Suite Based On Natural Language Understanding)
Rodolfo Delmonte
|
Vincenzo Pallotta
|
Violeta Seretan
|
Lammert Vrieling
|
David Walker
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations
pdf
bib
FipsCoView: On-line Visualisation of Collocations Extracted from Multilingual Parallel Corpora
Violeta Seretan
|
Eric Wehrli
Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
2010
pdf
bib
abs
FipsRomanian: Towards a Romanian Version of the Fips Syntactic Parser
Violeta Seretan
|
Eric Wehrli
|
Luka Nerima
|
Gabriela Soare
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
We describe work in progress on the development of a full syntactic parser for Romanian. This work is part of a larger project of multilingual extension of the Fips parser (Wehrli, 2007), already available for French, English, German, Spanish, Italian, and Greek, to four new languages (Romanian, Romansh, Russian and Japanese). The Romanian version was built by starting with the Fips generic parsing architecture for the Romance languages and customising the grammatical component, in close relation to the development of the lexical component. We describe this process and report on preliminary results obtained for journalistic texts.
pdf
bib
abs
A Recursive Treatment of Collocations
Luka Nerima
|
Eric Wehrli
|
Violeta Seretan
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
This article discusses the treatment of collocations in the context of a long-term project on the development of multilingual NLP tools. Besides classical two-word collocations, we will focus on the case of complex collocations (3 words or more) for which a recursive design is presented in the form of collocation of collocations. Although comparatively less numerous than two-word collocations, the complex collocations pose important challenges for NLP. The article discusses how these collocations are retrieved from corpora, inserted and stored in a lexical database, how the parser uses such knowledge and what are the advantages offered by a recursive approach to complex collocations.
pdf
bib
Sentence Analysis and Collocation Identification
Eric Wehrli
|
Violeta Seretan
|
Luka Nerima
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications
2009
pdf
bib
A Tool for Multi-Word Expression Extraction in Modern Greek Using Syntactic Parsing
Athina Michou
|
Violeta Seretan
Proceedings of the Demonstrations Session at EACL 2009
pdf
bib
Collocations in a Rule-Based MT System: A Case Study Evaluation of their Translation Adequacy
Eric Wehrli
|
Violeta Seretan
|
Luka Nerima
|
Lorenza Russo
Proceedings of the 13th Annual conference of the European Association for Machine Translation
2007
pdf
bib
User Requirements Analysis for Meeting Information Retrieval Based on Query Elicitation
Vincenzo Pallotta
|
Violeta Seretan
|
Marita Ailomaa
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
pdf
bib
Proceedings of the ACL 2007 Student Research Workshop
Chris Biemann
|
Violeta Seretan
|
Ellen Riloff
Proceedings of the ACL 2007 Student Research Workshop
pdf
bib
abs
Collocation translation based on sentence alignment and parsing
Violeta Seretan
|
Éric Wehrli
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs
Bien que de nombreux efforts aient été déployés pour extraire des collocations à partir de corpus de textes, seule une minorité de travaux se préoccupent aussi de rendre le résultat de l’extraction prêt à être utilisé dans les applications TAL qui pourraient en bénéficier, telles que la traduction automatique. Cet article décrit une méthode précise d’identification de la traduction des collocations dans un corpus parallèle, qui présente les avantages suivants : elle peut traiter des collocation flexibles (et pas seulement figées) ; elle a besoin de ressources limitées et d’un pouvoir de calcul raisonnable (pas d’alignement complet, pas d’entraînement) ; elle peut être appliquée à plusieurs paires des langues et fonctionne même en l’absence de dictionnaires bilingues. La méthode est basée sur l’information syntaxique provenant du parseur multilingue Fips. L’évaluation effectuée sur 4000 collocations de type verbe-objet correspondant à plusieurs paires de langues a montré une précision moyenne de 89.8% et une couverture satisfaisante (70.9%). Ces résultats sont supérieurs à ceux enregistrés dans l’évaluation d’autres méthodes de traduction de collocations.
2006
pdf
bib
Multilingual Collocation Extraction: Issues and Solutions
Violeta Seretan
|
Eric Wehrli
Proceedings of the Workshop on Multilingual Language Resources and Interoperability
pdf
bib
Accurate Collocation Extraction Using a Multilingual Parser
Violeta Seretan
|
Eric Wehrli
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
2004
pdf
bib
Using the Web as a Corpus for the Syntactic-Based Collocation Identification
Violeta Seretan
|
Luka Nerima
|
Eric Wehrli
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
2003
pdf
bib
Creating a multilingual collocations dictionary from large text corpora
Luka Nerima
|
Violeta Seretan
|
Eric Wehrli
10th Conference of the European Chapter of the Association for Computational Linguistics
pdf
bib
Creating a multilingual collocations dictionary from large text corpora
Luka Nerima
|
Violeta Seretan
|
Eric Wehrli
10th Conference of the European Chapter of the Association for Computational Linguistics
2002
pdf
bib
The Use of Referential Constraints in Structuring Discourse
Violeta Seretan
|
Dan Cristea
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)