Abstract
The distribution of knowledge elements such as entity types and event types is long-tailed in natural language. Hence information extraction datasets naturally conform long-tailed distribution. Although imbalanced datasets can teach the model about the useful real-world bias, deep learning models may learn features not generalizable to rare or unseen expressions of entities or events during evaluation, especially for rare types without sufficient training instances. Existing approaches for the long-tailed learning problem seek to manipulate the training data by re-balancing, augmentation or introducing extra prior knowledge. In comparison, we propose to handle the generalization challenge by making the evaluation instances closer to the frequent training cases. We design a new transformation module that transforms infrequent candidate mention representation during evaluation with the average mention representation in the training dataset. Experimental results on classic benchmarks on three entity or event extraction datasets demonstrates the effectiveness of our framework.- Anthology ID:
- 2023.eacl-main.97
- Volume:
- Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
- Month:
- May
- Year:
- 2023
- Address:
- Dubrovnik, Croatia
- Editors:
- Andreas Vlachos, Isabelle Augenstein
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1339–1350
- Language:
- URL:
- https://aclanthology.org/2023.eacl-main.97
- DOI:
- 10.18653/v1/2023.eacl-main.97
- Cite (ACL):
- Pengfei Yu and Heng Ji. 2023. Shorten the Long Tail for Rare Entity and Event Extraction. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 1339–1350, Dubrovnik, Croatia. Association for Computational Linguistics.
- Cite (Informal):
- Shorten the Long Tail for Rare Entity and Event Extraction (Yu & Ji, EACL 2023)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2023.eacl-main.97.pdf