This folder contains :
1)	The source code that produces a ".arff" file that should be used in WEKA to build the classifier and output predictions.
2)	Arff file with predictions (just a sample of the first 1400 suspicious documents, the full file is bigger). The implementation of  Nave Bayes from WEKA was used to produce the predictions.
3)	The XML files that contains the detected plagiarism annotation.

PS. The files 2 and 3 were produced using the corpus PAN-PC-11 and our method with its best configuration n = 6 and m = 4.
