Ludovic Quintard

2012

pdf
REPERE : premiers résultats d’un défi autour de la reconnaissance multimodale des personnes (REPERE : preliminary results of a multimodal person recognition challenge) [in French]
Juliette Kahn | Aude Giraudel | Matthieu Carré | Olivier Galibert | Ludovic Quintard
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP

pdf abs
Extended Named Entities Annotation on OCRed Documents: From Corpus Constitution to Evaluation Campaign
Olivier Galibert | Sophie Rosset | Cyril Grouin | Pierre Zweigenbaum | Ludovic Quintard
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Within the framework of the Quaero project, we proposed a new definition of named entities, based upon an extension of the coverage of named entities as well as the structure of those named entities. In this new definition, the extended named entities we proposed are both hierarchical and compositional. In this paper, we focused on the annotation of a corpus composed of press archives, OCRed from French newspapers of December 1890. We present the methodology we used to produce the corpus and the characteristics of the corpus in terms of named entities annotation. This annotated corpus has been used in an evaluation campaign. We present this evaluation, the metrics we used and the results obtained by the participants.

pdf abs
The REPERE Corpus : a multimodal corpus for person recognition
Aude Giraudel | Matthieu Carré | Valérie Mapelli | Juliette Kahn | Olivier Galibert | Ludovic Quintard
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The REPERE Challenge aims to support research on people recognition in multimodal conditions. To assess the technology progression, annual evaluation campaigns will be organized from 2012 to 2014. In this context, the REPERE corpus, a French videos corpus with multimodal annotation, has been developed. This paper presents datasets collected for the dry run test that took place at the beginning of 2012. Specific annotation tools and guidelines are mainly described. At the time being, 6 hours of data have been collected and annotated. Last section presents analyses of annotation distribution and interaction between modalities in the corpus.

2011

pdf
Proposal for an Extension of Traditional Named Entities: From Guidelines to Evaluation, an Overview
Cyril Grouin | Sophie Rosset | Pierre Zweigenbaum | Karën Fort | Olivier Galibert | Ludovic Quintard
Proceedings of the 5th Linguistic Annotation Workshop

pdf
Structured and Extended Named Entity Evaluation in Automatic Speech Transcriptions
Olivier Galibert | Sophie Rosset | Cyril Grouin | Pierre Zweigenbaum | Ludovic Quintard
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

In the QA and information retrieval domains progress has been assessed via evaluation campaigns(Clef, Ntcir, Equer, Trec).In these evaluations, the systems handle independent questions and should provide one answer to each question, extracted from textual data, for both open domain and restricted domain. Quæro is a program promoting research and industrial innovation on technologies for automatic analysis and classification of multimedia and multilingual documents. Among the many research areas concerned by Quæro. The Quaero project organized a series of evaluations of Question Answering on Web Data systems in 2008 and 2009. For each language, English and French the full corpus has a size of around 20Gb for 2.5M documents. We describe the task and corpora, and especially the methodologies used in 2008 to construct the test of question and a new one in the 2009 campaign. Six types of questions were addressed, factual, Non-factual(How, Why, What), List, Boolean. A description of the participating systems and the obtained results is provided. We show the difficulty for a question-answering system to work with complex data and questions.

The Quæro program that promotes research and industrial innovation on technologies for automatic analysis and classification of multimedia and multilingual documents. Within its context a set of evaluations of Named Entity recognition systems was held in 2009. Four tasks were defined. The first two concerned traditional named entities in French broadcast news for one (a rerun of ESTER 2) and of OCR-ed old newspapers for the other. The third was a gene and protein name extraction in medical abstracts. The last one was the detection of references in patents. Four different partners participated, giving a total of 16 systems. We provide a synthetic descriptions of all of them classifying them by the main approaches chosen (resource-based, rules-based or statistical), without forgetting the fact that any modern system is at some point hybrid. The metric (the relatively standard Slot Error Rate) and the results are also presented and discussed. Finally, a process is ongoing with preliminary acceptance of the partners to ensure the availability for the community of all the corpora used with the exception of the non-Quæro produced ESTER 2 one.