Feature-Rich Twitter Named Entity Recognition and Classification

Utpal Kumar Sikdar, Björn Gambäck


Abstract
Twitter named entity recognition is the process of identifying proper names and classifying them into some predefined labels/categories. The paper introduces a Twitter named entity system using a supervised machine learning approach, namely Conditional Random Fields. A large set of different features was developed and the system was trained using these. The Twitter named entity task can be divided into two parts: i) Named entity extraction from tweets and ii) Twitter name classification into ten different types. For Twitter named entity recognition on unseen test data, our system obtained the second highest F1 score in the shared task: 63.22%. The system performance on the classification task was worse, with an F1 measure of 40.06% on unseen test data, which was the fourth best of the ten systems participating in the shared task.
Anthology ID:
W16-3922
Volume:
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Bo Han, Alan Ritter, Leon Derczynski, Wei Xu, Tim Baldwin
Venue:
WNUT
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
164–170
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/W16-3922/
DOI:
Bibkey:
Cite (ACL):
Utpal Kumar Sikdar and Björn Gambäck. 2016. Feature-Rich Twitter Named Entity Recognition and Classification. In Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), pages 164–170, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Feature-Rich Twitter Named Entity Recognition and Classification (Sikdar & Gambäck, WNUT 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/W16-3922.pdf
Data
WNUT 2016 NER