Temporally-Informed Analysis of Named Entity Recognition

Shruti Rijhwani, Daniel Preotiuc-Pietro


Abstract
Natural language processing models often have to make predictions on text data that evolves over time as a result of changes in language use or the information described in the text. However, evaluation results on existing data sets are seldom reported by taking the timestamp of the document into account. We analyze and propose methods that make better use of temporally-diverse training data, with a focus on the task of named entity recognition. To support these experiments, we introduce a novel data set of English tweets annotated with named entities. We empirically demonstrate the effect of temporal drift on performance, and how the temporal information of documents can be used to obtain better models compared to those that disregard temporal information. Our analysis gives insights into why this information is useful, in the hope of informing potential avenues of improvement for named entity recognition as well as other NLP tasks under similar experimental setups.
Anthology ID:
2020.acl-main.680
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7605–7617
Language:
URL:
https://aclanthology.org/2020.acl-main.680
DOI:
10.18653/v1/2020.acl-main.680
Bibkey:
Cite (ACL):
Shruti Rijhwani and Daniel Preotiuc-Pietro. 2020. Temporally-Informed Analysis of Named Entity Recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7605–7617, Online. Association for Computational Linguistics.
Cite (Informal):
Temporally-Informed Analysis of Named Entity Recognition (Rijhwani & Preotiuc-Pietro, ACL 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2020.acl-main.680.pdf
Video:
 http://slideslive.com/38929195