Deep Investigation of Cross-Language Plagiarism Detection Methods

Jérémy Ferrero; Laurent Besacier; Didier Schwab; Frédéric Agnès

doi:10.18653/v1/W17-2502

Deep Investigation of Cross-Language Plagiarism Detection Methods

Jérémy Ferrero, Laurent Besacier, Didier Schwab, Frédéric Agnès

Abstract

This paper is a deep investigation of cross-language plagiarism detection methods on a new recently introduced open dataset, which contains parallel and comparable collections of documents with multiple characteristics (different genres, languages and sizes of texts). We investigate cross-language plagiarism detection methods for 6 language pairs on 2 granularities of text units in order to draw robust conclusions on the best methods while deeply analyzing correlations across document styles and languages.

Anthology ID:: W17-2502
Volume:: Proceedings of the 10th Workshop on Building and Using Comparable Corpora
Month:: August
Year:: 2017
Address:: Vancouver, Canada
Editors:: Serge Sharoff, Pierre Zweigenbaum, Reinhard Rapp
Venue:: BUCC
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6–15
Language:
URL:: https://aclanthology.org/W17-2502
DOI:: 10.18653/v1/W17-2502
Bibkey:
Cite (ACL):: Jérémy Ferrero, Laurent Besacier, Didier Schwab, and Frédéric Agnès. 2017. Deep Investigation of Cross-Language Plagiarism Detection Methods. In Proceedings of the 10th Workshop on Building and Using Comparable Corpora, pages 6–15, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):: Deep Investigation of Cross-Language Plagiarism Detection Methods (Ferrero et al., BUCC 2017)
Copy Citation:
PDF:: https://preview.aclanthology.org/dois-2013-emnlp/W17-2502.pdf
Presentation:: W17-2502.Presentation.pdf
Code: FerreroJeremy/Cross-Language-Dataset

PDF Cite Search Code Presentation