In order to replicate our experiments, you need to


Step 1) download the Chen, Kim and Mooney datasets from
	- http://www.cs.utexas.edu/~ml/clamp/sportscasting/data.tar.gz (English)
	- http://www.cs.utexas.edu/~ml/clamp/sportscasting/data-kr.tar.gz (Korean)
	and extract them to ./xml-files
	English data should be in ./xml-files/data
	Korean data should be in ./xml-files/data-kr

Step 2) download Mark Johnson's cky-parser and inside-outside implementation from
	- http://web.science.mq.edu.au/~mjohnson/code/inside-outside.tgz (inside-outside)
	- http://web.science.mq.edu.au/~mjohnson/code/cky.tbz (cky-parser)
	and extract them to ./software
	(the file-structure should be ./software/inside-outside/<files> and ./software/cky/<files>)

Step 3) build the inside-outside program and the cky parser

Step 4) change $HOMEF in make_data.sh to the path to this folder

Step 5) run make_data.sh

Step 6) run run.sh
	this might take some time (several hours for all experiments) - if you do not have PBS set up, you have to modify run.sh so as to
	not pass the scripts to qsub but to directly call them (it contains some comments on that)

Step 7) once everything is run, run evaluate_all.sh
	this will create *.eval-files in an evaluation-subdirectory created within each experiment folder
	e.g. ./experiments/WordOrder/English/evaluation/leave1_0.1.eval
	this file gives the accuracy (in the last line) on the held-out game and contains a detailed view of the final parsing
	process
	calculate scores by averaging over all 4 folds for a setting, i.e. leave1_0.1.eval, leave2_0.1.eval, leave3_0.1.eval and leave4_0.1.eval



Ad Step 6)	If you want to run less experiments, you can simply modify run.sh or
		call an experiment directly.
		The experiment scripts are named as follows
		./experiments/WordOrder/English/leave1_0.5.sh
			this is an experiment with the WO-PCFG model, trained on games 2,3,4 (not 1) and evaluated on 1
			with a dirichlet-prior of alpha=0.5
		./experiments/NoWordOrder/Korean/jitter/leave3_0.1.sh
			this is an experiment with the NoWo-PCFG model, trained on games 1,2,4 (not 3) and evaluated on 3
			with a dirichlet-prior of alpha=0.1 and jittering


