Named Entity Recognition on Turkish Tweets

Dilek Küçük, Guillaume Jacquet, Ralf Steinberger


Abstract
Various recent studies show that the performance of named entity recognition (NER) systems developed for well-formed text types drops significantly when applied to tweets. The only existing study for the highly inflected agglutinative language Turkish reports a drop in F-Measure from 91% to 19% when ported from news articles to tweets. In this study, we present a new named entity-annotated tweet corpus and a detailed analysis of the various tweet-specific linguistic phenomena. We perform comparative NER experiments with a rule-based multilingual NER system adapted to Turkish on three corpora: a news corpus, our new tweet corpus, and another tweet corpus. Based on the analysis and the experimentation results, we suggest system features required to improve NER results for social media like Twitter.
Anthology ID:
L14-1328
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
450–454
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/380_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Dilek Küçük, Guillaume Jacquet, and Ralf Steinberger. 2014. Named Entity Recognition on Turkish Tweets. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 450–454, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Named Entity Recognition on Turkish Tweets (Küçük et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/380_Paper.pdf