The streams used in this paper are from Chinese and English Gigaword Corpora, which are released by Linguistic Data Consortium (LDC). As introduced in our paper, we mainly used the AFP (Agence France Presse) and APW (Associated Press Worldstream) section and only kept the news documents whose type is ``story''.

Here, we provide our annotation for bi-lingual lexicon extraction and stream alignment. In this folder, we provide Chinese-English translation for 2008 AFP/APW stream and 2010 AFP/AFP stream, which are used in Section 4.2. The translation pairs also can be used for evaluating the stream alignments in Section 4.1.

Specifically, each line in *_bursty_oov.txt is a Chinese out-of-vocabulary word, which can be correctly translated to the English words in its corresponding line in *_gold.txt. Given that a Chinese word may have multiple correct English translations, the multiple translations are separate by '\t' in *_gold.txt

Moreover, we provide a sample result (sample_result.txt) which is derived by aligning AFP Chinese-English news stream. Each line is an aligned node pair. The format of a line is as follows:

Chinese_word \t Chinese_bursty_period \t English_word \t English_bursty_period \t overall_score \t year
