TR-SEQ: Named Entity Recognition Dataset for Turkish Search Engine Queries

Berkay Topçu, İlknur Durgar El-Kahlout


Abstract
Recognizing named entities in short search engine queries is a difficult task due to their weaker contextual information compared to long sentences. Standard named entity recognition (NER) systems that are trained on grammatically correct and long sentences fail to perform well on such queries. In this study, we share our efforts towards creating a cleaned and labeled dataset of real Turkish search engine queries (TR-SEQ) and introduce an extended label set to satisfy the search engine needs. A NER system is trained by applying the state-of-the-art deep learning method BERT to the collected data and its high performance on search engine queries is reported. Moreover, we compare our results with the state-of-the-art Turkish NER systems.
Anthology ID:
2021.ranlp-1.158
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
1417–1422
Language:
URL:
https://aclanthology.org/2021.ranlp-1.158
DOI:
Bibkey:
Cite (ACL):
Berkay Topçu and İlknur Durgar El-Kahlout. 2021. TR-SEQ: Named Entity Recognition Dataset for Turkish Search Engine Queries. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1417–1422, Held Online. INCOMA Ltd..
Cite (Informal):
TR-SEQ: Named Entity Recognition Dataset for Turkish Search Engine Queries (Topçu & Durgar El-Kahlout, RANLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2021.ranlp-1.158.pdf