Deep Investigation of Cross-Language Plagiarism Detection Methods
Jérémy Ferrero, Laurent Besacier, Didier Schwab, Frédéric Agnès
Abstract
This paper is a deep investigation of cross-language plagiarism detection methods on a new recently introduced open dataset, which contains parallel and comparable collections of documents with multiple characteristics (different genres, languages and sizes of texts). We investigate cross-language plagiarism detection methods for 6 language pairs on 2 granularities of text units in order to draw robust conclusions on the best methods while deeply analyzing correlations across document styles and languages.- Anthology ID:
- W17-2502
- Volume:
- Proceedings of the 10th Workshop on Building and Using Comparable Corpora
- Month:
- August
- Year:
- 2017
- Address:
- Vancouver, Canada
- Editors:
- Serge Sharoff, Pierre Zweigenbaum, Reinhard Rapp
- Venue:
- BUCC
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6–15
- Language:
- URL:
- https://aclanthology.org/W17-2502
- DOI:
- 10.18653/v1/W17-2502
- Cite (ACL):
- Jérémy Ferrero, Laurent Besacier, Didier Schwab, and Frédéric Agnès. 2017. Deep Investigation of Cross-Language Plagiarism Detection Methods. In Proceedings of the 10th Workshop on Building and Using Comparable Corpora, pages 6–15, Vancouver, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Deep Investigation of Cross-Language Plagiarism Detection Methods (Ferrero et al., BUCC 2017)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/W17-2502.pdf
- Code
- FerreroJeremy/Cross-Language-Dataset