Readme WordContextVisualizer Version 0.1
----------------------------------------

System Requirements: Tested on Mac OS X Version 10.6.7, Windows Vista and Windows 7, Java 1.6. Sufficient main memory should be allocatable (1GB is usually enough).

The tool requires no installation. 

Before Starting: How to use LDA
-------------------------------
The software makes use of the external program MALLET to perform LDA, released under the Common Public License: McCallum, Andrew Kachites.  "MALLET: A Machine Learning for Language Toolkit." http://mallet.cs.umass.edu. 2002.
You have to download MALLET from http://mallet.cs.umass.edu/ and the WordContextVisualizer will give input to MALLET and accept output from MALLET:
-input: All contexts will be stored as individual text files in a folder "contexts" contained by the same folder as the executable of the WordContextVisualizer. MALLET will use this information.
-output: the files "output.mallet", "doc-topics.txt" and "topic-keys.txt" shall be saved by MALLET in the same folder as the executable of the WordContextVisualizer.
-processing: MALLET can be started by the WordContextVisualizer (if it has the necessary rights) or can be started manually copying the program call produced and printed by the WordContextVisualizer. For a high number of iterations we recommend starting MALLET manually, otherwise it might not work.

Loading Data/Data Format
------------------------
Double-clicking on WordContextVisualization.bat opens the word context canvas. 

CSV-Files can be loaded by Data -> Load CSV. The input is a comma separated file which has to fulfill the following requirements:
- Separator is ,
- Masking character is "
Each line has to contain at least the following four columns: "TimeStamp", "Regex", "Match" and "Context".
-TimeStamp format shall be MM/dd/yy HH:mm:ss
-Regex is a regular expression describing the term or set of terms for which the contexts were extracted
-Match is a term that was found to match the Regex within the corpus under investigation
-Context is the context of Match within the corpus of investigation, i.e. a number of fixed words before and after Match (including Match)

Pre-compiled session files can be loaded by Data -> Load session (File Format: ALL Files), choosing *.Session files. Loading should take only a couple of seconds. 

Sample Data
-----------
The data used in the paper cannot be distributed along with the software, because of license restrictions. A license for the New York Times Annotated Corpus (http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T19) is required. However, we provide a manually edited sample containing sentences copied from Wikipedia (SampleDataFromWikipedia.csv). You can use this file in order to test the software and use it as a template for creating your own csv files.


Interaction
-----------

On the context canvas, each coloured dot represents one context. The color of the dot depends on its temporal location, according to the color map on the right. Using the sliders to the left and the right of the color map, time intervals can be chosen. The contexts on the context canvas change accordingly. Mousing over a dot reveals the context and the time stamp. 

The color map can be adjusted in the Options panel below the canvas. A slider allows for the increase/decrease in pointsize for each context, whereas the opacity of the dots can be changed with the Opacity slider on the right. 

When contexts are very close together they may overlap so that they cannot be distinguished easily, introducing random jitter reduces clutter. With the Jitter slider on the right, the rate of jitter can be chosen. LDA visualizations tend to have heavy overplotting, increasing the jitter and decreasing the point size is recommended.

Enabling "Show coordinations" provides the coordinates for each context. 

Geometrical zooming can be done by using the mouse wheel and the inner canvas can be panned by dragging the mouse with the left mouse button pressed.

The aggregated views (e.g. Figure 1 of the paper) appear only with LDA in a separate window. Like the main canvas they are zoomable.

Used libraries
--------------
Please read the information in the folder "WordContextVisualizerV0.1_lib" to learn more about third-party libraries used by this software.


