Mathieu Mangeot

Also published as: Mathieu Mangeot-Lerebours


2018

We work on improving the Cesselin, a large and open source Japanese-French bilingual dictionary digitalized by OCR, available on the web, and contributively improvable online. Labelling its examples (about 226000) would significantly enhance their usefulness for language learners. Examples are proverbs, idiomatic constructions, normal usage examples, and, for nouns, phrases containing a quantifier. Proverbs are easy to spot, but not examples of other types. To find a method for automatically or at least semi-automatically annotating them, we have studied many entries, and hypothesized that the degree of lexical similarity between results of MT into a third language might give good cues. To confirm that hypothesis, we sampled 500 examples and used Google Translate to translate into English their Japanese expressions and their French translations. The hypothesis holds well, in particular for distinguishing examples of normal usage from idiomatic examples. Finally, we propose a detailed annotation procedure and discuss its future automatization.

2017

2014

Economic issues related to the information processing techniques are very important. The development of such technologies is a major asset for developing countries like Cambodia and Laos, and emerging ones like Vietnam, Malaysia and Thailand. The MotAMot project aims to computerize an under-resourced language: Khmer, spoken mainly in Cambodia. The main goal of the project is the development of a multilingual lexical system targeted for Khmer. The macrostructure is a pivot one with each word sense of each language linked to a pivot axi. The microstructure comes from a simplification of the explanatory and combinatory dictionary. The lexical system has been initialized with data coming mainly from the conversion of the French-Khmer bilingual dictionary of Denis Richer from Word to XML format. The French part was completed with pronunciation and parts-of-speech coming from the FeM French-english-Malay dictionary. The Khmer headwords noted in IPA in the Richer dictionary were converted to Khmer writing with OpenFST, a finite state transducer tool. The resulting resource is available online for lookup, editing, download and remote programming via a REST API on a Jibiki platform.

2013

2012

2010

2006

This paper presents the use of the “Jibiki” generic dictionary online development platform in the case of the GDEF Estonian-French bilingual dictionary building project. This platform has been developed mainly by Mathieu Mangeot and Gilles Sérasset based on their research work in the domain. The platform is generic and thus can be used in (almost) any kind of dictionary development project from simple monolingual lexicons to complex multilingual pivot dictionaries as well as terminological resources. The platform is available online, thus it allows entry writers to work and collaborate from any part of the world. It consists in two main modules and data management tools. There is one module for elaborating complex queries on the data and one module for editing entries online. The editing modules generate automatically an interface from the XML structure of the entry.

2004

2002