TimeBankPT: A TimeML Annotated Corpus of Portuguese

Francisco Costa, António Branco


Abstract
In this paper, we introduce TimeBankPT, a TimeML annotated corpus of Portuguese. It has been produced by adapting an existing resource for English, namely the data used in the first TempEval challenge. TimeBankPT is the first corpus of Portuguese with rich temporal annotations (i.e. it includes annotations not only of temporal expressions but also about events and temporal relations). In addition, it was subjected to an automated error mining procedure that checks the consistency of the annotated temporal relations based on their logical properties. This procedure allowed for the detection of some errors in the annotations, that also affect the original English corpus. The Portuguese language is currently undergoing a spelling reform, and several countries where Portuguese is official are in a transitional period where old and new orthographies are valid. TimeBankPT adopts the recent spelling reform. This decision is to preserve its future usefulness. TimeBankPT is freely available for download.
Anthology ID:
L12-1096
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3727–3734
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/246_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Francisco Costa and António Branco. 2012. TimeBankPT: A TimeML Annotated Corpus of Portuguese. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3727–3734, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
TimeBankPT: A TimeML Annotated Corpus of Portuguese (Costa & Branco, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/246_Paper.pdf