Abstract
In this paper we examine the representativeness of the EventCorefBank (ECB, Bejan and Harabagiu, 2010) with regards to the language population of large-volume streams of news. The ECB corpus is one of the data sets used for evaluation of the task of event coreference resolution. Our analysis shows that the ECB in most cases covers one seminal event per domain, what considerably simplifies event and so language diversity that one comes across in the news. We augmented the corpus with a new corpus component, consisting of 502 texts, describing different instances of event types that were already captured by the 43 topics of the ECB, making it more representative of news articles on the web. The new “ECB+” corpus is available for further research.- Anthology ID:
- L14-1646
- Volume:
- Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
- Month:
- May
- Year:
- 2014
- Address:
- Reykjavik, Iceland
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 4545–4552
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/840_Paper.pdf
- DOI:
- Cite (ACL):
- Agata Cybulska and Piek Vossen. 2014. Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 4545–4552, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Cite (Informal):
- Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution (Cybulska & Vossen, LREC 2014)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/840_Paper.pdf