These two folders contain all code to reproduce the results from:

Rob van der Goot and Gertjan van Noord. 2016. Parser Adaptation for Social
Media by Integrating Normalization. In Proceedings of the 55th Annual Meeting
of the Association for Computational Linguistics.

The scripts that generate the graph and table from our paper can be found in
the folder /monoise/scripts/. You just need to change the paths to the dataset.
Unfortunately, the Twitter treebank from Foster et al(2011) is not publically
available. 

The contents of this package are snapshots of two separate repositories:
- MoNoise: A lexical normalization model:
  https://bitbucket.org/robvanderg/monoise (commit: 3d63292)
- BerkeleyGraph: A version of the Berkeley parser which can parse word
  lattices. Note that some other functions are broken.
  https://bitbucket.org/robvanderg/berkeleygraph (commit: f40726d)


If you simply want to run the best working model:
> cd monoise
> ./scripts/prep.sh
> icmbuild
> ./tmp/bin/binary -m RU -i test.txt -r working/chenli -p test.txt -c 6 -a -u

The c++ normalization communicates with the java parser through sockets, if
this does not work use:
> ./tmp/bin/binary -m RU -i test.txt -r working/chenli -c 6 -a -u | java -jar ../BerkeleyGraph/parser.jar -gr enData/ewtwsj.gr -latticeWeight 2

If you want to train the normalization model yourself (you need +- 20gb):
> ./tmp/bin/binary -m TR -i enData/chenli -r working/chenli -a -n 12 -s $i
If you want to train the grammar yourself, please use the original
BerkeleyParser

For more information about the options of the two systems, run them separately:
> cd BerkeleyGraph
> java -jar parser.jar
> cd ../monoise
> icmbuild
> ./tmp/bin/binary
or see monoise/config.en

If you have any questions about the code/systems, don't hesitate to contact:
r.van.der.goot@rug.nl

