Abstract
This paper addresses the task of semantic frame induction based on pre-trained language models (LMs). The current state of the art is to directly use contextualized embeddings from models such as BERT and to cluster them in a two step clustering process (first lemma-internal, then over all verb tokens in the data set). We propose not to use the LM’s embeddings as such but rather to refine them via some transformer-based denoising autoencoder. The resulting embeddings allow to obtain competitive results while clustering them in a single pass. This shows clearly that the autoendocer allows to already concentrate on the information that is relevant for distinguishing event types.- Anthology ID:
- 2023.iwcs-1.10
- Volume:
- Proceedings of the 15th International Conference on Computational Semantics
- Month:
- June
- Year:
- 2023
- Address:
- Nancy, France
- Editors:
- Maxime Amblard, Ellen Breitholtz
- Venue:
- IWCS
- SIG:
- SIGSEM
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 89–93
- Language:
- URL:
- https://aclanthology.org/2023.iwcs-1.10
- DOI:
- Cite (ACL):
- Younes Samih and Laura Kallmeyer. 2023. Unsupervised Semantic Frame Induction Revisited. In Proceedings of the 15th International Conference on Computational Semantics, pages 89–93, Nancy, France. Association for Computational Linguistics.
- Cite (Informal):
- Unsupervised Semantic Frame Induction Revisited (Samih & Kallmeyer, IWCS 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2023.iwcs-1.10.pdf