There are two main applications:

1) FragmentExtractor.jar : to extract recurring fragments from a treebank.
2) DoubleDOP.jar: to parse novel sentences with the DoubleDOP parser.

IMPORTANT: 

* make sure you have installed a java version 1.6 or above (to check from a terminal type java -version)

* before running DoubleDOP compile BitPar (present in the archive), as follows:
	cd BitPar/src
	make
	cd ../..


----------------------
1) FragmentExtractor
----------------------

USAGE: java [-Xmx1G] -jar FragmentExtractor.jar [-threads:1] [-markoBinarize:false] [-ukThreshold:-1] treebankFile

*   -Xmx1G -> max amount of memory reserved for the program (depending on the size of the treebank G=GygaBytes, M=Megabytes)
*	-threads -> how many threads to run (makes the work a lot fast on a multi-CPU machine, mind the memory though)
*	-markoBinarize -> whether to binarize the corpus (H=1, P=1)
*	-ukThreshold -> specifies the frequency threshold under which words are replaced with features 
*   treebankFile -> a file with PS structures in Penn format, such as treebankExample.mrg

A new directory FragmentExtractor_{current_date_time} will be created in the same directory of the treebankFile. 
Inside this directory the file 'fragments_approxFreq.txt' will contain the extracted fragments with their counts.

Example of usage:
   java -jar FragmentExtractor.jar treebankExample.mrg
   
----------------------
2) DoubleDOP
----------------------

USAGE: java [-Xmx1G] DoubleDOP.jar [-threads:1] [-nBest:1000] [-markoBinarize:false] [-ukThreshold:-1] [-smoothLexicon:false] trainingTreebankFile testTreebankFile fragmentFile outputDir

*   -Xmx1G -> max amount of memory reserved for the program (depending on the size of the treebank G=GygaBytes, M=Megabytes)
*	-threads -> how many threads to run (makes the work a lot fast on a multi-CPU machine, mind the memory though)
*	-nBest -> number of best derivations per test sentence
*	-markoBinarize -> whether to binarize the corpus (H=1, P=1)
*	-ukThreshold -> specifies the frequency threshold under which words are replaced with features
*	-trainingTreebankFile -> the treebank file (CFG-rules not present in fragmentFile will be extracted) 
*	-testTreebankFile -> the test treebank (also for evaluation)
*	-fragmentFile -> the file containing the fragments (e.g. extracted with FragmentExtractor)
*	-outputDir -> the output directory (if it already exists a new directory will be created)

Example of usage:
   java -jar DoubleDOP.jar treebankExample.mrg testExample.mrg fragments_approxFreq.txt ./

