A Preliminary Study on Environmental Sound Classification Leveraging Large-Scale Pretrained Model and Semi-Supervised Learning
You-Sheng Tsao, Tien-Hong Lo, Jiun-Ting Li, Shi-Yan Weng, Berlin Chen
Abstract
With the widespread commercialization of smart devices, research on environmental sound classification has gained more and more attention in recent years. In this paper, we set out to make effective use of large-scale audio pretrained model and semi-supervised model training paradigm for environmental sound classification. To this end, an environmental sound classification method is first put forward, whose component model is built on top a large-scale audio pretrained model. Further, to simulate a low-resource sound classification setting where only limited supervised examples are made available, we instantiate the notion of transfer learning with a recently proposed training algorithm (namely, FixMatch) and a data augmentation method (namely, SpecAugment) to achieve the goal of semi-supervised model training. Experiments conducted on bench-mark dataset UrbanSound8K reveal that our classification method can lead to an accuracy improvement of 2.4% in relation to a current baseline method.- Anthology ID:
- 2021.rocling-1.14
- Volume:
- Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)
- Month:
- October
- Year:
- 2021
- Address:
- Taoyuan, Taiwan
- Editors:
- Lung-Hao Lee, Chia-Hui Chang, Kuan-Yu Chen
- Venue:
- ROCLING
- SIG:
- Publisher:
- The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
- Note:
- Pages:
- 103–110
- Language:
- URL:
- https://aclanthology.org/2021.rocling-1.14
- DOI:
- Cite (ACL):
- You-Sheng Tsao, Tien-Hong Lo, Jiun-Ting Li, Shi-Yan Weng, and Berlin Chen. 2021. A Preliminary Study on Environmental Sound Classification Leveraging Large-Scale Pretrained Model and Semi-Supervised Learning. In Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021), pages 103–110, Taoyuan, Taiwan. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP).
- Cite (Informal):
- A Preliminary Study on Environmental Sound Classification Leveraging Large-Scale Pretrained Model and Semi-Supervised Learning (Tsao et al., ROCLING 2021)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2021.rocling-1.14.pdf
- Data
- ImageNet, UrbanSound8K