This evaluation metric is described at: 

Naushad UzZaman and James F. Allen. 
Temporal Evaluation, 
The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (Short Paper), Portland, Oregon, USA, June 2011. 

Please cite the above paper in papers that make use of this toolkit. 

This file describes how to use the temporal evaluation toolkit. 

The toolkit contains two directories.
i. code
contains all source code (in python) and a configuration file. You can set your reference annotation, system output or other parameters in the config file. 
ii. data 
contains temporal links in TempEval-2 format (2-all-tlinks) of TimeBank 1.2 corpus <http://www.timeml.org/site/timebank/timebank.html>, document creation time (3-dct), system output (4-system_output).  

Running temporal evaluation program: 
% cd Temporal_Evaluation
% python code/temporal_evaluation.py data/2-all-tlinks/filename 
### example 
% python code/temporal_evaluation.py data/2-all-tlinks/ABC19980108.1830.0711.txt

If you want to create a new file, 
i. create the reference file under data/2-all-tlinks/
ii. create the document creation time file under data/3-dct/
iii. create the system output under data/4-system_output/ 
All files need to have the same filename. Check other files to find the file format. 

TimeBank documents have some inconsistencies and in some cases don't comply with temporal closure properties. For example, given two relation e1 BEFORE e2 and e2 BEFORE e3, adding e1 INCLUDES e3 is incorrect, since we can infer e1 BEFORE e3 from temporal closure. TimeBank annotators identified the relations between entities in isolation, without considering these temporal closure properties. Hence these inconsistencies. When creating Timegraph, we ignore inconsistent relations from TimeBank annotators. In the supplied documents, our system output is also same as TimeBank annotations. Since our verification is in the Timegraph of reference/system output, and we don't include these relations that violates temporal closure properties in our Timegraph, when matching we won't get 100% precision/recall. However, if we select consider_direct_match option as 'true' in config.txt, then we will do exact matching in the annotation/system output first before checking the relation in Timegraph. This option makes searching in Timegraph faster and also gives precision/recall 100% against reference. However, the performance (time in second) reported in the paper didn't consider direct matching, but did all matching in the Timegraph to get a proper evaluation of time taken to search in the Timegraph.

If you have any questions, please contact me by email at naushad _at_ cs.rochester.edu 
