Clément de Groc

Also published as: Clément De Groc


2014

pdf
Evaluating Web-as-corpus Topical Document Retrieval with an Index of the OpenDirectory
Clément de Groc | Xavier Tannier
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This article introduces a novel protocol and resource to evaluate Web-as-corpus topical document retrieval. To the contrary of previous work, our goal is to provide an automatic, reproducible and robust evaluation for this task. We rely on the OpenDirectory (DMOZ) as a source of topically annotated webpages and index them in a search engine. With this OpenDirectory search engine, we can then easily evaluate the impact of various parameters such as the number of seed terms, queries or documents, or the usefulness of various term selection algorithms. A first fully automatic evaluation is described and provides baseline performances for this task. The article concludes with practical information regarding the availability of the index and resource files.

pdf
Thematic Cohesion: measuring terms discriminatory power toward themes
Clément de Groc | Xavier Tannier | Claude de Loupy
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present a new measure of thematic cohesion. This measure associates each term with a weight representing its discriminatory power toward a theme, this theme being itself expressed by a list of terms (a thematic lexicon). This thematic cohesion criterion can be used in many applications, such as query expansion, computer-assisted translation, or iterative construction of domain-specific lexicons and corpora. The measure is computed in two steps. First, a set of documents related to the terms is gathered from the Web by querying a Web search engine. Then, we produce an oriented co-occurrence graph, where vertices are the terms and edges represent the fact that two terms co-occur in a document. This graph can be interpreted as a recommendation graph, where two terms occurring in a same document means that they recommend each other. This leads to using a random walk algorithm that assigns a global importance value to each vertex of the graph. After observing the impact of various parameters on those importance values, we evaluate their correlation with retrieval effectiveness.

2013

pdf
Lexicons from Comparable Corpora for Multilingual Information Retrieval (Lexiques de corpus comparables et recherche d’information multilingue) [in French]
Frederik Cailliau | Ariane Cavet | Clément De Groc | Claude De Loupy
Proceedings of TALN 2013 (Volume 2: Short Papers)

2012

pdf
Un critère de cohésion thématique fondé sur un graphe de cooccurrences (Topical Cohesion using Graph Random Walks) [in French]
Clément de Groc | Xavier Tannier | Claude de Loupy
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

2011

pdf
GrawlTCQ: Terminology and Corpora Building by Ranking Simultaneously Terms, Queries and Documents using Graph Random Walks
Xavier Tannier | Javier Couto | Clément de Groc
Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing

pdf
Babouk – exploration orientée du web pour la constitution de corpus et de terminologies (Babouk – oriented exploration of the web for the construction of corpora and terminologies)
Clément de Groc | Javier Couto | Helena Blancafort | Claude de Loupy
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations