Tijn Porcelijn


2006

pdf
Identifying Named Entities in Text Databases from the Natural History Domain
Caroline Sporleder | Marieke van Erp | Tijn Porcelijn | Antal van den Bosch | Pim Arntzen
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper, we investigate whether it is possible to bootstrap a named entity tagger for textual databases by exploiting the database structure to automatically generate domain and database-specific gazetteer lists. We compare three tagging strategies: (i) using the extracted gazetteers in a look-up tagger, (ii) using the gazetteers to automatically extract training data to train a database-specific tagger, and (iii) using a generic named entity tagger. Our results suggest that automatically built gazetteers in combination with a look-up tagger lead to a relatively good performance and that generic taggers do not perform particularly well on this type of data.

pdf
Spotting the ‘Odd-one-out’: Data-Driven Error Detection and Correction in Textual Databases
Caroline Sporleder | Marieke van Erp | Tijn Porcelijn | Antal van den Bosch
Proceedings of the Workshop on Adaptive Text Extraction and Mining (ATEM 2006)