Abstract
Plagiarism is a major issue in science and education. Complex plagiarism, such as plagiarism of ideas, is hard to detect, and therefore it is especially important to track improvement of methods correctly. In this paper, we study the performance of plagdet, the main measure for plagiarim detection, on manually paraphrased datasets (such as PAN Summary). We reveal its fallibility under certain conditions and propose an evaluation framework with normalization of inner terms, which is resilient to the dataset imbalance. We conclude with the experimental justification of the proposed measure. The implementation of the new framework is made publicly available as a Github repository.- Anthology ID:
- P18-2026
- Volume:
- Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
- Month:
- July
- Year:
- 2018
- Address:
- Melbourne, Australia
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 157–162
- Language:
- URL:
- https://aclanthology.org/P18-2026
- DOI:
- 10.18653/v1/P18-2026
- Cite (ACL):
- Anton Belyy, Marina Dubova, and Dmitry Nekrasov. 2018. Improved Evaluation Framework for Complex Plagiarism Detection. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 157–162, Melbourne, Australia. Association for Computational Linguistics.
- Cite (Informal):
- Improved Evaluation Framework for Complex Plagiarism Detection (Belyy et al., ACL 2018)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/P18-2026.pdf
- Code
- AVBelyy/normplagdet