This folder contains (almost) everything needed to replicate our experiments. You need to download the datasets, the inside-outside program and the parser we used, as described in HOWTO.

If you have questions, feel free to write an email to benjamin.borschinger@mq.edu.au

A quick-start step-by-step explanation is given in HOWTO

Explanation of the Folder Contents:

- README			-	this file
- HOWTO				-	a quick-start explanation of how to replicate our experiments
- src/				-	contains the scripts we used for preparing our experiments
   evaluation/evaluate.py	-	evaluation script
   evaluation/uniqueMeanings.py	-	script which calculates the number of correct meanings recovered not seen during training
   grammars/generalize.py	-	is used to marginalize out the context, as described in our paper
   grammars/WordOrder.py	-	automatically instantiates all the rule-schemata for WO-PCFG, given the contexts, meanings and vocabulary
   grammars/NoWordOrder.py	-	automatically instantiates all the rule-schemata for NoWo-PCFG, given the contexts, meanings and vocabulary
   data/processData.py		-	converts the Chen, Kim and Mooney xml-files into the format needed for grammatical inference (just strings)
   data/processData_multisets.py -	the same as processData.py, but assumes contexts to be multi-sets (as Chen, Kim and Mooney do); we did not find it to make any real difference
   data/xmlparser.py		-	the xml-parser used by processData.py to convert the data
   data/matchingExamples.py	-	allows you to calculate gold-standard training-file (mis)matches, assuming the example ids are correct (does not work well for the Korean data)
   data/matchingExamplesStrings.py -	allows you to calculate gold-standard training-file (mis)matches, pairing gold and training files by surface strings (not ids)

- software			-	where you need to put the inside-outside and cky programs (described in HOWTO)

- xml-data			-	where you need to put the Chen, Kim and Mooney training sets (described in HOWTO)


You should not need to run any of the python scripts directly. They are, however, commented and usually contain enough information on how to work with
them directly, if you want to.
