Data-to-Text Generation (D2T) problems can be considered as a stream of time-stamped events with a text summary being produced for each. The problem becomes more challenging when event summaries contain complex insights derived from multiple records either within an event, or across several events from the event stream. It is important to understand the different types of content present in the summary to help us better define the system requirements so that we can build better systems. In this paper, we propose a novel typology of content types, that we use to classify the contents of event summaries. Using the typology, a profile of a dataset is generated as the distribution of the aggregated content types which captures the specific characteristics of the dataset and gives a measure of the complexity present in the problem. Extensive experimentation on different D2T datasets is performed and these demonstrate that neural systems struggle in generating contents of complex types.
Generating a Word-Emotion Lexicon from #Emotional Tweets
Anil Bandhakavi | Nirmalie Wiratunga | Deepak P | Stewart Massie
Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)