Francis Chantree
2008
Cleaneval: a Competition for Cleaning Web Pages
Marco Baroni
|
Francis Chantree
|
Adam Kilgarriff
|
Serge Sharoff
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Cleaneval is a shared task and competitive evaluation on the topic of cleaning arbitrary web pages, with the goal of preparing web data for use as a corpus for linguistic and language technology research and development. The first exercise took place in 2007. We describe how it was set up, results, and lessons learnt