Adrian Perez
2022
Holistic Evaluation of Automatic TimeML Annotators
Mustafa Ocal
|
Adrian Perez
|
Antonela Radas
|
Mark Finlayson
Proceedings of the Thirteenth Language Resources and Evaluation Conference
TimeML is a scheme for representing temporal information (times, events, & temporal relations) in texts. Although automatic TimeML annotation is challenging, there has been notable progress, with F1s of 0.8–0.9 for events and time detection subtasks, and F1s of 0.5–0.7 for relation extraction. Individually, these subtask results are reasonable, even good, but when combined to generate a full TimeML graph, is overall performance still acceptable? We present a novel suite of eight metrics, combined with a new graph-transformation experimental design, for holistic evaluation of TimeML graphs. We apply these metrics to four automatic TimeML annotation systems (CAEVO, TARSQI, CATENA, and ClearTK). We show that on average 1/3 of the TimeML graphs produced using these systems are inconsistent, and there is on average 1/5 more temporal indeterminacy than the gold-standard. We also show that the automatically generated graphs are on average 109 edits from the gold-standard, which is 1/3 toward complete replacement. Finally, we show that the relationship individual subtask performance and graph quality is non-linear: small errors in TimeML subtasks result in rapid degradation of final graph quality. These results suggest current automatic TimeML annotators are far from optimal and significant further improvement would be useful.
2020
Distinguishing Between Foreground and Background Events in News
Mohammed Aldawsari
|
Adrian Perez
|
Deya Banisakher
|
Mark Finlayson
Proceedings of the 28th International Conference on Computational Linguistics
Determining whether an event in a news article is a foreground or background event would be useful in many natural language processing tasks, for example, temporal relation extraction, summarization, or storyline generation. We introduce the task of distinguishing between foreground and background events in news articles as well as identifying the general temporal position of background events relative to the foreground period (past, present, future, and their combinations). We achieve good performance (0.73 F1 for background vs. foreground and temporal position, and 0.79 F1 for background vs. foreground only) on a dataset of news articles by leveraging discourse information in a featurized model. We release our implementation and annotated data for other researchers
Search