2022
pdf
abs
Aesop’s fable “The North Wind and the Sun” Used as a Rosetta Stone to Extract and Map Spoken Words in Under-resourced Languages
Elena Knyazeva
|
Philippe Boula de Mareüil
|
Frédéric Vernier
Proceedings of the Thirteenth Language Resources and Evaluation Conference
This paper describes a method of semi-automatic word spotting in minority languages, from one and the same Aesop fable “The North Wind and the Sun” translated in Romance languages/dialects from Hexagonal (i.e. Metropolitan) France and languages from French Polynesia. The first task consisted of finding out how a dozen words such as “wind” and “sun” were translated in over 200 versions collected in the field — taking advantage of orthographic similarity, word position and context. Occurrences of the translations were then extracted from the phone-aligned recordings. The results were judged accurate in 96–97% of cases, both on the development corpus and a test set of unseen data. Corrected alignments were then mapped and basemaps were drawn to make various linguistic phenomena immediately visible. The paper exemplifies how regular expressions may be used for this purpose. The final result, which takes the form of an online speaking atlas (enriching the
https://atlas.limsi.fr website), enables us to illustrate lexical, morphological or phonetic variation.
2020
pdf
abs
Automatic Extraction of Verb Paradigms in Regional Languages: the case of the Linguistic Crescent varieties
Elena Knyazeva
|
Gilles Adda
|
Philippe Boula de Mareüil
|
Maximilien Guérin
|
Nicolas Quint
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)
Language documentation is crucial for endangered varieties all over the world. Verb conjugation is a key aspect of this documentation for Romance varieties such as those spoken in central France, in the area of the Linguistic Crescent, which extends overs significant portions of the old provinces of Marche and Bourbonnais. We present a first methodological experiment using automatic speech processing tools for the extraction of verbal paradigms collected and recorded during fieldworks sessions made in situ. In order to prove the feasibility of the approach, we test it with different protocols, on good quality data, and we offer possible ways of extension for this research.
2018
pdf
bib
Les méthodes « apprendre à chercher » en traitement automatique des langues : un état de l’art [A survey of learning-to-search techniques in Natural Language Processing]
Elena Knyazeva
|
Guillaume Wisniewski
|
François Yvon
Traitement Automatique des Langues, Volume 59, Numéro 1 : Varia [Varia]
2016
pdf
LIMSI@WMT’16: Machine Translation of News
Alexandre Allauzen
|
Lauriane Aufrant
|
Franck Burlot
|
Ophélie Lacroix
|
Elena Knyazeva
|
Thomas Lavergne
|
Guillaume Wisniewski
|
François Yvon
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
pdf
The QT21/HimL Combined Machine Translation System
Jan-Thorsten Peter
|
Tamer Alkhouli
|
Hermann Ney
|
Matthias Huck
|
Fabienne Braune
|
Alexander Fraser
|
Aleš Tamchyna
|
Ondřej Bojar
|
Barry Haddow
|
Rico Sennrich
|
Frédéric Blain
|
Lucia Specia
|
Jan Niehues
|
Alex Waibel
|
Alexandre Allauzen
|
Lauriane Aufrant
|
Franck Burlot
|
Elena Knyazeva
|
Thomas Lavergne
|
François Yvon
|
Mārcis Pinnis
|
Stella Frank
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
pdf
abs
Two-Step MT: Predicting Target Morphology
Franck Burlot
|
Elena Knyazeva
|
Thomas Lavergne
|
François Yvon
Proceedings of the 13th International Conference on Spoken Language Translation
This paper describes a two-step machine translation system that addresses the issue of translating into a morphologically rich language (English to Czech), by performing separately the translation and the generation of target morphology. The first step consists in translating from English into a normalized version of Czech, where some morphological information has been removed. The second step retrieves this information and re-inflects the normalized output, turning it into fully inflected Czech. We introduce different setups for the second step and evaluate the quality of their predictions over different MT systems trained on different amounts of parallel and monolingual data and report ways to adapt to different data sizes, which improves the translation in low-resource conditions, as well as when large training data is available.
pdf
abs
LIMSI@IWSLT’16: MT Track
Franck Burlot
|
Matthieu Labeau
|
Elena Knyazeva
|
Thomas Lavergne
|
Alexandre Allauzen
|
François Yvon
Proceedings of the 13th International Conference on Spoken Language Translation
This paper describes LIMSI’s submission to the MT track of IWSLT 2016. We report results for translation from English into Czech. Our submission is an attempt to address the difficulties of translating into a morphologically rich language by paying special attention to the morphology generation on target side. To this end, we propose two ways of improving the morphological fluency of the output: 1. by performing translation and inflection of the target language in two separate steps, and 2. by using a neural language model with characted-based word representation. We finally present the combination of both methods used for our primary system submission.
2015
pdf
bib
abs
Apprentissage par imitation pour l’étiquetage de séquences : vers une formalisation des méthodes d’étiquetage easy-first
Elena Knyazeva
|
Guillaume Wisniewski
|
François Yvon
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs
De nombreuses méthodes ont été proposées pour accélérer la prédiction d’objets structurés (tels que les arbres ou les séquences), ou pour permettre la prise en compte de dépendances plus riches afin d’améliorer les performances de la prédiction. Ces méthodes reposent généralement sur des techniques d’inférence approchée et ne bénéficient d’aucune garantie théorique aussi bien du point de vue de la qualité de la solution trouvée que du point de vue de leur critère d’apprentissage. Dans ce travail, nous étudions une nouvelle formulation de l’apprentissage structuré qui consiste à voir celui-ci comme un processus incrémental au cours duquel la sortie est construite de façon progressive. Ce cadre permet de formaliser plusieurs approches de prédiction structurée existantes. Grâce au lien que nous faisons entre apprentissage structuré et apprentissage par renforcement, nous sommes en mesure de proposer une méthode théoriquement bien justifiée pour apprendre des méthodes d’inférence approchée. Les expériences que nous réalisons sur quatre tâches de TAL valident l’approche proposée.
pdf
LIMSI@WMT’15 : Translation Task
Benjamin Marie
|
Alexandre Allauzen
|
Franck Burlot
|
Quoc-Khanh Do
|
Julia Ive
|
Elena Knyazeva
|
Matthieu Labeau
|
Thomas Lavergne
|
Kevin Löser
|
Nicolas Pécheux
|
François Yvon
Proceedings of the Tenth Workshop on Statistical Machine Translation
2014
pdf
Cross-Lingual POS Tagging through Ambiguous Learning: First Experiments (Apprentissage partiellement supervisé d’un étiqueteur morpho-syntaxique par transfert cross-lingue) [in French]
Guillaume Wisniewski
|
Nicolas Pécheux
|
Elena Knyazeva
|
Alexandre Allauzen
|
François Yvon
Proceedings of TALN 2014 (Volume 1: Long Papers)