Tran Van Canh


2014

pdf
Extending HeidelTime for Temporal Expressions Referring to Historic Dates
Jannik Strötgen | Thomas Bögel | Julian Zell | Ayser Armiti | Tran Van Canh | Michael Gertz
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Research on temporal tagging has achieved a lot of attention during the last years. However, most of the work focuses on processing news-style documents. Thus, references to historic dates are often not well handled by temporal taggers although they frequently occur in narrative-style documents about history, e.g., in many Wikipedia articles. In this paper, we present the AncientTimes corpus containing documents about different historic time periods in eight languages, in which we manually annotated temporal expressions. Based on this corpus, we explain the challenges of temporal tagging documents about history. Furthermore, we use the corpus to extend our multilingual, cross-domain temporal tagger HeidelTime to extract and normalize temporal expressions referring to historic dates, and to demonstrate HeidelTime’s new capabilities. Both, the AncientTimes corpus as well as the new HeidelTime version are made publicly available.