CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training
Linhao Dong, Zhecheng An, Peihao Wu, Jun Zhang, Lu Lu, Ma Zejun
Abstract
Speech or text representation generated by pre-trained models contains modal-specific information that could be combined for benefiting spoken language understanding (SLU) tasks. In this work, we propose a novel pre-training paradigm termed Continuous Integrate-and-Fire Pre-Training (CIF-PT). It relies on a simple but effective frame-to-token alignment: continuous integrate-and-fire (CIF) to bridge the representations between speech and text. It jointly performs speech-to-text training and language model distillation through CIF as the pre-training (PT). Evaluated on SLU benchmark SLURP dataset, CIF-PT outperforms the state-of-the-art model by 1.94% of accuracy and 2.71% of SLU-F1 on the tasks of intent classification and slot filling, respectively. We also observe the cross-modal representation extracted by CIF-PT obtains better performance than other neural interfaces for the tasks of SLU, including the dominant speech representation learned from self-supervised pre-training.- Anthology ID:
- 2023.findings-acl.566
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2023
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 8894–8907
- Language:
- URL:
- https://aclanthology.org/2023.findings-acl.566
- DOI:
- 10.18653/v1/2023.findings-acl.566
- Cite (ACL):
- Linhao Dong, Zhecheng An, Peihao Wu, Jun Zhang, Lu Lu, and Ma Zejun. 2023. CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8894–8907, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training (Dong et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2023.findings-acl.566.pdf