
=======================================
Word level discourse signal annotation
=======================================

This is the word-level discourse signal annotation converted from
the RST Signalling Corpus, as described in the paper.

Each file holds sequences of tags, separated by blank space.
Each line corresponds to one line of tokenized texts in Penn Treebank.
Multiple tags of one token are connected by '_'.

=================
Meaning of tags:
=================
1) Digits are signal indices mapping to the RST Signalling Corpus,
   except '0', which means 'empty tag'.

	e.g. 12        means the token is part of signal_12.
	     13_14     means the token is part of signal_13 as well as signal_14.

2) Digits connected by '-' are signal indices for a particular relation.
   It is always used with the boundary markers explained below.

	e.g. 113-114   refers to the relation signalled by signal_113 and signal_114.

3) 'nb','ne','sb','se' are markers of the beginning and the end of EDUs.
    nb: nucleus beginning
    ne: nucleus end
    sb: satellite beginning
    se: satellite end
    An nb/sb followed by an ne/se of the same relation is the 'boundary' of the relation.

	e.g. nb113-114  means the token is the beginning of the nucleus span of
	     the relation signalled by signal_113 and signal_114.

	     se120-122_120 means the token is the end of the satellite span of
             the relation signalled by signal_120 and signal_122,
	     and, at the same time, part of signal_120.



