Michael Gertz

2020

pdf bib abs
UniHD@CL-SciSumm 2020: Citation Extraction as Search
Dennis Aumiller | Satya Almasian | Philip Hausner | Michael Gertz
Proceedings of the First Workshop on Scholarly Document Processing

This work presents the entry by the team from Heidelberg University in the CL-SciSumm 2020 shared task at the Scholarly Document Processing workshop at EMNLP 2020. As in its previous iterations, the task is to highlight relevant parts in a reference paper, depending on a citance text excerpt from a citing paper. We participated in tasks 1A (citation identification) and 1B (citation context classification). Contrary to most previous works, we frame Task 1A as a search relevance problem, and introduce a 2-step re-ranking approach, which consists of a preselection based on BM25 in addition to positional document features, and a top-k re-ranking with BERT. For Task 1B, we follow previous submissions in applying methods that deal well with low resources and imbalanced classes.

2017

pdf bib abs
HeidelPlace: An Extensible Framework for Geoparsing
Ludwig Richter | Johanna Geiß | Andreas Spitz | Michael Gertz
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Geographic information extraction from textual data sources, called geoparsing, is a key task in text processing and central to subsequent spatial analysis approaches. Several geoparsers are available that support this task, each with its own (often limited or specialized) gazetteer and its own approaches to toponym detection and resolution. In this demonstration paper, we present HeidelPlace, an extensible framework in support of geoparsing. Key features of HeidelPlace include a generic gazetteer model that supports the integration of place information from different knowledge bases, and a pipeline approach that enables an effective combination of diverse modules tailored to specific geoparsing tasks. This makes HeidelPlace a valuable tool for testing and evaluating different gazetteer sources and geoparsing methods. In the demonstration, we show how to set up a geoparsing workflow with HeidelPlace and how it can be used to compare and consolidate the output of different geoparsing approaches.

2015

pdf bib
HeidelToul: A Baseline Approach for Cross-document Event Ordering
Bilel Moulahi | Jannik Strötgen | Michael Gertz | Lynda Tamine
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
A Baseline Temporal Tagger for all Languages
Jannik Strötgen | Michael Gertz
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib abs
Computational Narratology: Extracting Tense Clusters from Narrative Texts
Thomas Bögel | Jannik Strötgen | Michael Gertz
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Computational Narratology is an emerging field within the Digital Humanities. In this paper, we tackle the problem of extracting temporal information as a basis for event extraction and ordering, as well as further investigations of complex phenomena in narrative texts. While most existing systems focus on news texts and extract explicit temporal information exclusively, we show that this approach is not feasible for narratives. Based on tense information of verbs, we define temporal clusters as an annotation task and validate the annotation schema by showing that the task can be performed with high inter-annotator agreement. To alleviate and reduce the manual annotation effort, we propose a rule-based approach to robustly extract temporal clusters using a multi-layered and dynamic NLP pipeline that combines off-the-shelf components in a heuristic setting. Comparing our results against human judgements, our system is capable of predicting the tense of verbs and sentences with very high reliability: for the most prevalent tense in our corpus, more than 95% of all verbs are annotated correctly.

pdf bib abs
Extending HeidelTime for Temporal Expressions Referring to Historic Dates
Jannik Strötgen | Thomas Bögel | Julian Zell | Ayser Armiti | Tran Van Canh | Michael Gertz
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Research on temporal tagging has achieved a lot of attention during the last years. However, most of the work focuses on processing news-style documents. Thus, references to historic dates are often not well handled by temporal taggers although they frequently occur in narrative-style documents about history, e.g., in many Wikipedia articles. In this paper, we present the AncientTimes corpus containing documents about different historic time periods in eight languages, in which we manually annotated temporal expressions. Based on this corpus, we explain the challenges of temporal tagging documents about history. Furthermore, we use the corpus to extend our multilingual, cross-domain temporal tagger HeidelTime to extract and normalize temporal expressions referring to historic dates, and to demonstrate HeidelTime’s new capabilities. Both, the AncientTimes corpus as well as the new HeidelTime version are made publicly available.

pdf bib
Chinese Temporal Tagging with HeidelTime
Hui Li | Jannik Strötgen | Julian Zell | Michael Gertz
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

2013

pdf bib
HeidelTime: Tuning English and Developing Spanish Resources for TempEval-3
Jannik Strötgen | Julian Zell | Michael Gertz
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

pdf bib abs
Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards
Jannik Strötgen | Michael Gertz
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In the last years, temporal tagging has received increasing attention in the area of natural language processing. However, most of the research so far concentrated on processing news documents. Only recently, two temporal annotated corpora of narrative-style documents were developed, and it was shown that a domain shift results in significant challenges for temporal tagging. Thus, a temporal tagger should be aware of the domain associated with documents that are to be processed and apply domain-specific strategies for extracting and normalizing temporal expressions. In this paper, we analyze the characteristics of temporal expressions in different domains. In addition to news- and narrative-style documents, we add two further document types, namely colloquial and scientific documents. After discussing the challenges of temporal tagging on the different domains, we describe some strategies to tackle these challenges and describe their integration into our publicly available temporal tagger HeidelTime. Our cross-domain evaluation validates the benefits of domain-sensitive temporal tagging. Furthermore, we make available two new temporally annotated corpora and a new version of HeidelTime, which now distinguishes between four document domain types.