Francis Chantree


2008

pdf
Cleaneval: a Competition for Cleaning Web Pages
Marco Baroni | Francis Chantree | Adam Kilgarriff | Serge Sharoff
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Cleaneval is a shared task and competitive evaluation on the topic of cleaning arbitrary web pages, with the goal of preparing web data for use as a corpus for linguistic and language technology research and development. The first exercise took place in 2007. We describe how it was set up, results, and lessons learnt